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Statistical  models  are  useful  and  convenient  tools  for  describing  data. 
However,  models  can  be  adversely  effected  by  mistaken  information,  poor 
assumptions,  and  small  subsets  of  influential  data.   Seemingly  inconsequential 
changes  to  the  observed  data,  the  postulated  model,  or  the  inference  technique  may 
greatly  affect  the  scientific  conclusions  made  from  model-fitting.  This  dissertation 
considers  local  assessment  of  perturbations  in  parametric  and  semi-parametric 
models.  The  effects  of  perturbation  are  assessed  using  a  Taylor-series  approximation 
in  the  vicinity  of  the  fitted  model,  leading  to  graphical  diagnostics  that  allow  the 
analyst  to  identify  sensitive  parameter  estimates  and  sources  of  undue  influence. 

First,  computational  formulas  are  given  to  assess  the  influence  of  perturbations 
on  linear  combinations  of  the  parameter  estimates  for  maximum  likelihood.   A 
diagnostic  plot  that  identifies  sensitive  fitted  values  and  self-predicting  observations 
in  linear  regression  is  introduced.  Second,  the  eigensystem  of  the  acceleration  matrix 
for  perturbing  each  element  of  the  design  matrix  in  linear  regression  is  derived. 

vii 


This  eigensystem  leads  to  a  diagnostic  plot  for  identifying  specific  elements  that 
are  influential,  and  it  is  shown  how  the  results  relate  to  principal  components, 
collinearity,  and  influence  diagnostics. 

Next,  it  is  shown  that  the  conceptual  framework  and  computational  tools 
easily  extend  from  maximum  likelihood  to  inference  that  is  performed  using 
estimating  equations.  In  particular,  local  assessment  of  parameter  estimates  and 
standard  errors  derived  from  optimal  estimating  functions  is  addressed.  Algebraic 
expressions  and  computational  formulas  are  given  for  both  first-  and  second-order 
assessment,  and  applications  such  as  quasi-likelihood  are  considered. 

Finally,  the  techniques  are  applied  to  linear  mixed  models  in  order  to  assess 
the  influence  of  individual  observations  and  clusters.  While  previous  applications 
considered  only  estimation,  the  effects  of  perturbations  on  both  estimation  and 
prediction  is  addressed.  It  is  shown  that  the  effects  on  the  fixed  and  random  effect 
inferences  can  be  assessed  simultaneously  by  using  an  influence  measure  based  on 
the  unstandardized  predictive  log-likelihood  of  the  random  effects. 
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CHAPTER  1 
INTRODUCTION  TO  LOCAL  INFLUENCE      ;   .' 

1.1     Premise 

George  Box  (1982,  p.  34)  stated  "all  models  are  wrong;  some  models  are 

useful,"  and  A.  F.  Desmond  (1997b,  p.  117)  stated  "a  robust  method  ought  not  to 

perform  poorly  in  neighborhoods  of  the  nominally  true  model."  Indeed,  conclusions 

based  on  a  statistical  model  should  be  reliable  under  small  perturbations  to  the 

data,  the  model's  assumptions  or  the  particulars  of  the  estimation  method.  In  a 

seminal  paper.  Cook  (1986)  utilized  the  basic  concept  of  perturbing  the  postulated 

model  or  the  observed  data  to  develop  a  variety  of  diagnostic  tools.  He  showed  how 

small  perturbations  can  be  used  to  develop  easily-computable  diagnostics  for  almost 

any  model  estimated  by  maximum  likelihood  (ML)  estimation. 

1.2     Measuring  the  Effects  of  Perturbations 
In  a  statistical  analysis,  parameter  estimates  are  a  crucial  part  of  the 
decision-making  process.  These  estimates  are  a  function  of  the  observed  data,  the 
postulated  model,  and  the  analyst's  choice  of  statistical  inference  technique.  Any 
one  of  these  three  aspects  of  a  statistical-modeling  situation  is  subject  to  uncertainty 
and  doubt:  observed  data  may  be  incorrectly  recorded  or  subject  to  rounding 
error,  a  slightly  different  model  may  be  just  as  plausible  as  the  postulated  model, 
and  nuisance  aspects  of  modeling  such  as  choosing  hyperparameters  are  subject  to 
second-guessing.  If  small  changes  in  these  aspects  of  a  statistical  modeling  problem 
create  large  changes  in  parameter  estimates,  the  analysis  is  unreliable.  Therefore, 
the  change  in  parameter  estimates  due  to  perturbation  can  serve  as  a  barometer  for 
the  robustness  of  the  scientific  conclusions. 


Following  Cook  (1986),  a  vector  of  q  perturbations  is  denoted  as 
u  =  {u}i,U2, .  ■  ■  ,ujq)',  and  the  g-dimensional  space  of  possible  perturbations  is 
denoted  as  ft.    Included  in  this  space  is  one  point,  wo,  that  does  not  change 
the  original  formulation  of  the  problem,  and  this  is  called  the  null  perturbation. 
The  perturbations  are  not  parameters  to  be  estimated,  but  rather  they  are 
known  real-valued  quantities  introduced  into  the  problem  by  the  analyst.  These 
perturbations  might  represent  measurement  errors,  unequal  variances,  case  weights, 
correlated  errors,  missing  predictors  or  some  other  change  to  the  data,  model  or 
inference  method.  Prudent  choice  of  perturbations  can  lead  to  useful  diagnostics. 
For  example,  if  parameter  estimates  are  sensitive  to  perturbing  the  contribution 
(weight)  of  a  particular  observation  to  estimation,  it  indicates  that  the  observation 
is  overly  influential  on  the  estimates. 

For  data  y  and  parameters  6,  let  L{0]  y)  denote  the  log-likelihood  from 
the  original  postulated  model,  and  let  L^{9\y)  denote  the  log-likelihood  resulting 
from  the  inclusion  of  perturbations.    If  the  perturbation  specified  is  the  null 
perturbation,  then  the  two  likelihoods  are  equivalent.   The  resulting  maximum 
likelihood  estimators  (MLEs)  of  these  two  likelihoods  are  denoted  as  0  and  0^^, 
respectively.  Cook  (1986)  used  the  likelihood  displacement  to  summarize  parameter 
estimate  changes  under  perturbation: 

LD(u;)  =  2[L(0;y)-L(0,;j/)].  (1.1) 

From  inspection  of  (1.1),  we  see  that  the  likelihood  displacement  is  the  difference  of 
the  original  likelihood  evaluated  at  two  values:  the  original  MLEs  and  the  perturbed 
MLEs.  Because  0  maximizes  the  original  likelihood,  LD{u})  is  non-negative  and 
achieves  its  minimum  value  of  zero  when  u;  =  Wq-  As  the  perturbed  MLEs  become 
increasingly  different  from  the  original  MLEs,  LD{u))  becomes  larger.  Thus,  the 


size  of  LD{u})  provides  a  measure  of  how  much  the  perturbation  affects  parameter 
estimation. 

Effects  on  a  subset  of  parameters,  Oi  from  0'  =  {e[,  O'^),  can  be  obtained  by 
using  the  profile  likelihood  displacement: 

LD^''^{Lj)  =  2[L{e,,02;y)-L{0i^,g{ei^y,y)],  (1-2) 

where  Oi^  is  from  9^  =  {0[^,  9^^,  and  g{9i^)  is  the  vector  that  maximizes  the  profile 
log-likelihood,  L{9i^,92),  with  respect  to  02-  The  profile  likelihood  displacement 
allows  influence  to  be  "directed"  through  9i,  with  the  other  parameter  estimates 
only  changing  in  sympathy. 

In  the  next  section  it  is  shown  how  to  obtain  diagnostic  tools  from  a 
local  influence  analysis  of  the  likelihood  displacement  and  the  profile  likelihood 
displacement.       "*-:..'%>•     _v.      -^ 

«-     '■   ■•        t-       ''  •   :  .  ■■*■ 

'^.^     Vr,  V>'    1.3     Local  Influence  Diagnostics 
.-  Although  we  could  simply  choose  values  for  a?  and  compute  LD{lj),  this 
is  not  particularly  useful  for  two  reasons.  First,  LD{u:)  is  often  not  a  closed-form 
function  of  w.   Hence,  if  we  wish  to  find  its  value,  re-fitting  the  model  may  be 
necessary.  Second,  even  if  calculating  LD{u})  was  easy,  we  must  somehow  decide 
what  values  to  use  for  w.   This  choice  would  be  highly  data-dependent  and  a 
nuisance  for  the  analyst. 

Therefore,  we  would  like  a  methodology  that  can  examine  the  information 
contained  in  the  function  LD{u})  and  yet  does  not  require  a  choice  of  actual  values 
for  a>  nor  involve  re-estimating  the  model  parameters.   Following  Cook  (1986), 
this  can  be  accomplished  using  the  notion  of  curvature.  Simple  plots  can  then  be 
used  to  identify  whether  estimation  is  sensitive  to  any  particular  elements  of  the 
g-dimensional  perturbation. 


■B^?;.  ■  •'  i,r-^..i?!■ 
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Figure  1.1:  Schematic  of  a  plane  curve  with  its  tangent  line.  The  angle  between  the 
tangent  line  and  the  x-axis  is  (f). 


1.3.1     Curvature  of  a  Plane  Curve 

Curvature  is  a  measure  of  how  quickly  a  curve  turns  or  twists  at  a  particular 
point  (Swokowski  1988).  Let  {f{t),g{t))  be  a  smooth  plane  curve.  The  arc  length 
parameter  s  of  such  a  curve  is  the  distance  travelled  along  the  curve  from  a  fixed 
point  A  to  another  point  on  the  curve  (Figure  1.1).  Letting  Ia  denote  the  value  of  t 
that  yields  point  A,  s  can  be  computed  as 


{t)=  [\r{u)f  +  {g'{u)f)Uu, 

JtA 


(1.3) 


and  we  may  express  the  curve  in  terms  of  s:  {f{s~^{s)),g{s~^{s))).  Additionally,  let 
T{s)  be  the  tangent  line  at  a  point  on  the  curve,  and  let  (j)  denote  the  angle  between 
T{s)  and  the  x-axis. 

Curvature  is  defined  as  the  absolute  value  of  the  rate  of  change  in  0  with 
respect  to  s: 

50 


C  = 


ds 


(1.4) 


A  plane  curve  that  is  turning  quickly  at  {f{t),g{t))  has  a  larger  curvature  than  a 
curve  that  is  nearly  straight  at  that  point. 


!  Ct 


Figure  1.2:  Curve  with  two  circles  of  curvature 


The  reciprocal  of  curvature  is  called  the  radius  of  curvature,  and  it  is  the 
radius  of  the  best  fitting  circle  at  that  point  on  the  curve.  These  circles  of  curvature 
have  the  same  first  and  second  derivatives  and  tangent  line  as  the  curve  and  lie  on 
its  concave  side.  Figure  1.2  shows  the  curve  (|sin(f),  \sm{2t))  and  two  of  its  circles 
of  curvature.  The  one  on  the  left  occurs  when  t  =  ^  while  the  other  one  occurs 
when  t  =  arccos(O).  Their  curvatures  are  6  and  8/9  respectively.  The  expression 
"turning  on  a  dime"  refers  to  to  a  sharp  turn:  one  in  which  the  radius  of  curvature 
is  small,  and  consequently  the  curvature  of  the  path  is  large. 

The  formula  for  curvature  is 


\f'{t)g"{t)-g'{t)f"{t) 


C 


We  are  particularly  interested  in  assessing  curvature  at  the  local  minimum  of  a 
function  from  x  to  y,  i.e.  when  x  =  t.  For  this  special  case,  curvature  measures 
how  quickly  the  function  increases  from  that  minimum  value.  In  this  case,  (1.5) 
simplifies  to  the  acceleration  of  the  curve: 

d'g{x) 


(1.5) 


C  = 


dx^ 


(1.6) 


X^Xq 


Figure  1.3:  Likelihood  displacement  with  directional  vectors 

where  g{xo)  is  the  function's  minimum.  This  result  can  be  applied  to  LD{u})  at  u;o 
when  there  is  only  a  single  perturbation,  lo.  Here,  the  curvature  is  given  by 

:■-  .  \.  d^LD{uj) 


doj^ 


(1.7) 


Ul=(jJo 


and  it  measures  how  quickly  the  parameter  estimates  change  when  u  is  moved 
away  from  the  null  perturbation.  A  large  curvature  indicates  that  even  if  uj  is  close 
to,  or  local  to  ljq,  the  likelihood  displacement  increases  substantially.   Hence,  a 
large  curvature  indicates  that  even  a  slight  perturbation  can  be  very  influential  on 
parameter  estimates. 

1.3.2     Using  Plane  Curves  to  Describe  LDju:) 

The  curvature  of  plane  curves  can  also  be  used  to  describe  LD{u})  when 
q  >  1.   Now  u>  can  be  moved  away  from  wq  in  different  directions,  as  shown  by 
the  vectors  in  Figure  1.3  for  a  two-dimensional  perturbation.  These  vectors  can  be 
written  as 


U3v{a)  =  Wo  +  av, 


(1.8) 


Figure  1.4:  Intersection  of  plane  and  surface  creates  space  curve 

where  the  vector  v  determines  the  direction  and  the  value  a  determines  the  position 
relative  to  wq-   Let  Py  denote  a  plane  that  contains  u}y{a)  and  extends  directly 
upward  from  ft  to  intersect  LD{(jj)  (Figure  1.4).  The  intersection  of  Pt;  and  LD{u}) 
forms  a  space  curve,  (a;„(a)',LZ)(u;„(a)))^  .  Because  this  space  curve  is  contained 
within  the  plane  Py,  its  curvature  can  be  obtained  as  if  it  were  a  plane  curve, 
(a,  LD{u}y{a)).  Thus  the  curvature  of  LD{u:y{a))  at  Wo  is  simply 

d^LD{u}y{a)) 


L/y     — 


dd^ 


(1.9) 


a=0 


Different  directions  produce  curvatures  of  various  sizes.  The  normalized  vector  that 
produces  the  largest  curvature  is  denoted  Umax  and  is  of  particular  interest. 


1.3.3     Direction  of  Maximum  Curvature,  v 


1   '^max 


The  relative  sizes  of  the  elements  of  Vmax  indicate  which  elements  of  ui  are 
most  influential  on  the  parameter  estimates.  For  example,  if  the  i*'*  element  of  Uma: 


^  More  formally,  these  space  curves  are  called  normal  sections.  The  term  comes 
from  the  fact  that  each  plane,  Py,  contains  the  surface's  principal  normal  vector,  the 
vector  that  is  orthogonal  to  the  surface's  tangent  plane  at  ijJq.  Curvatures  of  the 
normal  sections  are  referred  to  as  normal  curvatures. 


8 

is  large,  then  Wj  is  a  key  player  in  producing  the  largest  curvature  of  LD(a;)  at  the 
null  perturbation,  i.e.  in  producing  an  immediate  change  in  parameter  estimates. 

The  direction  of  maximum  curvature  is  the  main  diagnostic  of  Cook's  local 
influence  methodology.  Typically,  Vmax  is  standardized  to  have  length  one  and  is 
then  plotted  against  indices  l,2,...,q.  If  any  elements  are  particularly  large  relative 
to  the  others,  they  stand  out  in  the  plot.  The  analyst  then  may  want  to  take  steps 
to  alleviate  the  sensitivity  to  the  perturbation.  What,  if  any,  steps  to  be  taken 
depend  upon  the  perturbation  scheme  and  the  particulars  of  the  data  analysis. 

For  some  models  and  perturbation  schemes,  the  formula  for  Vmax  is  known 
explicitly,  while  in  other  situations  it  requires  some  moderate  computations.  In 
analyses  such  as  regression,  Vmax  is  typically  a  function  of  familiar  quantities. 

1.4     Perturbations  in  Regression 
Consider  the  normal  linear  regression  model: 

Y  =  Xl3  +  e,  (1.10) 

where  V  is  a  n  x  1  vector  of  responses,  X  is  a  known  n  x  k  model  matrix  of  rank  k, 
P  is  a  k  X  1  vector  of  unknown  parameters,  and  e  is  a  n  x  1  vector  of  independent 
identically  distributed  normal  random  variables  with  mean  0  and  variance  cr^  >  0. 
Hence,  Y  ~  N{X^, a2j„),  and  0  =  (/?i, ...,l3k, a^)'. 

1.4.1     Scheme  1:  Measurement  Error  in  the  Response 

In  order  to  examine  the  effects  of  measurement  error  in  the  response, 
Schwarzmann  (1991)  introduced  perturbations  to  the  observed  data.  He  considered 
fitting  a  model  to  the  observed  data  set  with  slight  perturbations:  tji  +  a;,-,  for 
i  —  1,. . .  ,q.  Hence,  ojo  is  a  vector  of  zeroes  for  this  perturbation  scheme.  By  letting 
q  —  n,  all  of  the  observations  can  be  perturbed  simultaneously.  This  allows  the 
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Table  1.1:  Scottish  hills  race  data 


Observation 

Record  Time  (min.) 

Distance  (miles) 

Climb  (ft) 

1 

16.083 

2.5 

650 

2 

48.350 

6.0 

2500 

3 

33.650 

6.0 

900 

4 

45.600 

7.5 

800 

5 

62.267 

8.0 

3070 

6 

73.217 

8.0 

2866 

7 

204.617 

16.0 

7500 

8 

36.367 

6.0 

800 

9 

29.750 

5.0 

800 

10 

39.750 

6.0 

650 

11 

192.667 

28.0 

2100 

12 

43.050 

5.0 

2000 

13 

65.000 

9.5 

2200 

14 

44.133 

6.0 

500 

15 

26.933 

4.5 

1500 
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analyst  to  determine  which  observations  would  have  the  most  effect  on  parameter 
estimates  in  the  event  that  there  is  measurement  or  rounding  error. 
1.4.1.1     Influence  on  the  full  vector  of  parameter  estimates 

Atkinson  (1986)  analyzed  a  data  set  of  record  times  for  35  diff'erent  hill  races, 
as  recorded  by  the  Scottish  Hill  Runners  Association  (Table  1.1).   Scatter  plots 
indicate  the  positive  association  of  the  three  variables:  winning  time,  race  distance 
and  race  climb.   (Figure  1.5).   The  influence  of  perturbations  on  the  parameter 
estimates  from  a  regression  of  winning  time  on  DISTANCE  and  CLIMB  is  of 
interest.  For  convenience,  the  regressors  and  response  have  been  centered  and  scaled, 
thereby  eliminating  the  need  for  an  intercept.  Thus,  the  full  vector  of  p  parameters 

is  9  =  (/^distj^cUmbjO"  )'• 

Figure  1.6  shows  Vmax  plotted  against  the  indices  1  through  g  =  n  =  35  for 
the  Scottish  hills  data.  Element  18,  which  corresponds  to  perturbation  of  the  18*'* 
observation,  is  the  dominant  contributer  to  the  direction  of  maximum  curvature. 
This  observation  is  from  a  short  race  with  little  CLIMB,  and  yet  it  has  a  large 
record  time.  Similar  races  have  much  smaller  record  times,  indicating  that  race  18 
is  an  outlier.  Atkinson  (1986)  has  actually  suggested  that  the  winning  time  for  this 
race  may  have  been  18.67  minutes  rather  than  1  hour  and  18.67  minutes. 

Interestingly,  it  can  be  shown  that  the  direction  of  maximum  curvature  for 
Scheme  1  is  proportional  to  the  residuals.   Thus,  if  we  standardized  the  vector 
of  residuals  to  length  1  and  plotted  them  against  case  indices,  we  would  obtain 
Figure  1.6.  This  implies  that  parameter  estimates  are  most  sensitive  to  a  set  of 
small  additive  measurement  errors  when  those  errors  tend  to  exaggerate  (or  correct) 
the  model's  error  in  estimating  each  data  point,  with  the  most  exaggeration  (or 
correction)  occurring  for  the  most  poorly  predicted  points. 
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Figure  1.5:  Hills  data;  scatter-plots  with  highlighted  cases 
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Figure  1.7:  Hills  data;  response  perturbation 

1.4.1.2  Influence  on  a  subset  of  parameter  estimates 

Plotting  Vmax  for  the  profile  likelihood  displacement,  LD^^^^{(jj),  can  indicate 
which  components  of  a;  affect  a  subset  of  parameters.   Examining  influence  on 
§1  =  (7^  results  in  the  same  Umax  as  the  original  likelihood  displacement,  i.e.  it  is 
proportional  to  the  residuals.  The  directions  of  maximum  curvatures  for  6i  =  ^dist 
and  §1  =  /3cUmb  are  shown  in  Figures  1.7(a)  and  1.7(b).  The  first  plot  indicates  that 
small  measurement  error  in  race  11 's  winning  time  is  most  influential  on  ^dist-  In  the 
plot  of  Umax  for  /^cUmb,  elements  7  and  11  stand  out.  The  fact  that  the  two  elements 
have  opposite  signs  can  be  used  for  interpretation.  They  indicate  that  maximum 
curvature  is  primarily  achieved  by  perturbing  winning  times  7  and  11  in  opposite 
directions  (i.e.  making  one  observation  smaller  and  the  other  larger).  Thus,  if  the 
record  winning  times  for  races  7  and  11  were  slightly  closer  together  or  farther  apart, 
this  would  aff'ect  the  slope  estimate  for  CLIMB. 

1.4.1.3  Plotting  options 

Note  that  the  standardized  i>max  is  only  unique  up  to  sign  change.   For 
example.  Figure  1.8  displays  the  same  standardized  vector  as  Figure  1.6,  but  with 
all  of  the  elements'  signs  switched.  This  suggests  an  alternative  plot  in  which  the 
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signs  of  the  elements  are  suppressed:  a  star  plot.  Here,  each  element  is  displayed 
as  a  point,  with  the  distance  from  the  origin  equal  to  its  absolute  value.  The  q 
elements  are  simply  displayed  around  the  point  (0,0),  starting  in  the  first  quadrant 
and  ending  in  the  fourth  quadrant.  The  plot  is  completed  by  labelling  the  points 
with  indices.  Labeling  all  points  can  result  in  cluttered  plot  (1.9(a)),  and  so  a  better 
option  is  to  only  label  the  largest  elements  (1.9(b)).  In  these  renderings,  element 
18  is  again  noticeably  large,  but  our  attention  is  also  drawn  to  element  7.  This 
observation  has  the  characteristics  of  an  influential  observation:  it  has  the  highest 
climb  and  one  of  the  longest  distances  (Figure  1.5(c)),  as  well  as  the  second  largest 
residual. 

Unlike  the  index  plot,  the  star  plot  does  not  give  the  illusion  that  the 
indices  are  in  a  specific  order.  However,  the  suppression  of  element  signs  can  be  an 
advantage  or  a  disadvantage.  The  advantage  is  that  when  looking  at  several  plots 
of  directional  vectors,  suppressing  the  signs  helps  the  analyst  identify  patterns  by 
emphasizing  element  size.  The  disadvantage  is  that  the  diagnostic  interpretation  of 
the  signs  is  lost. 
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Figure  1.9:  Hills  data;  |vmax|  as  a  star  plot 


1.4.2     Scheme  2:  Perturbations  to  Measurement  Precision 

Another  assumption  that  can  be  perturbed  is  that  of  equal  variance  (Cook 

2 

1986).  This  is  accomplished  by  letting  V{yi)  =  ^  for  i  =  1, . . . ,  n.  Here  the  null 
perturbation  is  Wq  =  In- 

With  this  perturbation  scheme,  we  deem  that  some  observations  are  observed 
with  more  precision  than  others.  The  resulting  perturbed  MLEs  weight  each  case 
according  to  its  perturbed  variance.  If  certain  cases  are  influential,  then  even  a 
slight  down-weighting  or  up-weighting  of  them  produces  large  displacements. 

This  perturbation  scheme  has  an  important  connection  to  the  widely-used 
influence  measure  Cook's  Distance  (Cook  1977).  Recall  that  the  i*^^  Cook's  Distance 
is  a  measure  of  the  change  in  regression  parameter  estimates  that  results  from 
deleting  the  i*''  observation.  Clearly  a  deleted  observation  provides  no  information  for 
finding  /9.  Similarly,  as  the  i"*  perturbation  is  decreased  from  1,  the  i*'*  observation 
provides  less  and  less  information  to  the  estimation  process  owing  to  its  increased 


:.      -   -  •  -.    ^        15 

variance.  Thus,  case-deletion  can  be  thought  of  as  a  rather  drastic  perturbation: 
one  that  results  in  a  variance  ->  oo  for  the  i*^  observation.  Case-deletion  is  referred 
to  as  a  global  influence  technique  because  it  assesses  the  effect  of  perturbations  that 
are  not  local  to  the  null  perturbation  in  ft. 

1.4.2.1  Influence  on  the  full  vector  of  parameter  estimates 

Figure  1.10(a)  shows  the  direction  of  maximum  curvature  for  the  Scottish 
hills  data  set.  Again,  element  18  dominates  the  direction  of  maximum  curvature. 
This  implies  that  the  precision  of  observation  18  is  the  most  influential  on  the  full 
set  of  parameter  estimates. 

1.4.2.2  Influence  on  a^ 

Using  the  profile  likelihood  displacement,  influence  directed  upon  a^  alone 
can  be  assessed.  Observation  18  is  again  implicated  by  Umax  as  being  most  locally 
influential  (Figure  1.10(b)). 

1.4.2.3  Influence  on  individual  regression  parameter  estimates 

Figure  1.10(d)  shows  Umax  when  isolating  effects  on  /^aist-  Although  there  are 
several  points  that  stand  out,  observation  7  is  identified  as  being  most  influential 
since  element  7  is  large.  This  race  is  also  implicated  as  being  locally  influential  in 
the  plot  of  Umax  for  Z^cUmb  (Figure  1.10(c)).  In  fact,  the  DEBET  AS  diagnostic 
(Belsley  et  al.  1980)  also  identifies  race  7  as  being  influential  on  /5cUmb-  Recall  that 
the  value  of  DEBET ASi^j  measures  how  the  f^  parameter  estimate  is  affected 
by  deleting  the  i^^  observation.  Writing  the  n  DEBET  AS  for  the  /''  parameter 
estimate  in  vector  form,  it  is  shown  in  Section  2.5  that  this  vector  is  proportional  to 
the  following  Hadamard  (element-wise)  product  of  vectors: 
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Figure  1.10:  Hills  data;  variance  perturbation 
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Figure  1.11:  Standardized  vector  oi  DFBETAS  for  /3cUmb 
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Here,  /i,i  is  the  i^^  diagonal  element  of  the  "hat"  matrix  X{X'X)~^X' ,  r^  is  the  i*'^ 
residual,  and  rij  is  the  z*''  residual  from  regressing  Xj  on  the  other  columns  of  X. 
It  can  be  shown  (see  Section  2.5)  that  Umax  for  directing  the  effects  of  the  variance 
perturbation  on  $j  is  proportional  to 
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The  elements  of  the  two  vectors  differ  only  by  a  factor  of  y^-  •  In  fact,  standardizing 
the  vector  of  35  DFBETAS  for  ^cUmb  to  length  1  and  plotting  them  (Figure  1.11) 
produces  a  figure  nearly  identical  to  Figure  1.10(c). 
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Influence  on  both  regression  parameter  estimates 

Figure  1.10(e)  displays  the  direction  of  maximum  curvature  for  directed 
influence  on  ^.  Plot  (f)  is  the  direction  of  largest  curvature  under  the  constraint  of 
orthogonality  to  Vmax-  The  two  plots  are  very  similar  to  the  directed  influence  plots 
in  (c)  and  (d). 

1.5     Basic  Perturbations 

As  an  alternative  to  finding  the  direction  of  maximum  curvature,  we  may 
utilize  basic  perturbations.   The  i*''  basic  perturbation  is  the  same  as  the  null 
perturbation  except  for  an  additive  change  of  1  in  its  i*''  element.  For  example,  the 
first  basic  perturbation  is  Wq  +  (1, 0, . . . ,  0)',  and  corresponds  to  moving  a  distance 
of  1  along  the  first  axis  in  fi.   Typically,  the  curvatures  for  each  of  the  q  basic 
perturbations  would  be  computed  and  then  plotted  together. 

As  an  example,  consider  applying  the  variance  perturbation  to  a  linear 
regression,  and  examining  directed  influence  on  ^.   In  this  case,  the  i*''  basic 
curvature,  Cfr.,  is  a  local  influence  analog  to  the  z*''  Cook's  Distance,  because  it 
corresponds  to  re- weighting  a  single  observation.  It  is  shown  in  Section  2.7.2.1  that 
the  i^^  basic  perturbation  for  this  scheme  measures  influence  on  the  i^^  fitted  value. 
Figure  1.12(a)  plots  the  35  curvatures  for  the  basic  perturbations  applied  to  the 
Scottish  hills  data,  while  Figure  1.12(b)  plots  the  35  Cook's  Distances.  Although 
the  scales  differ,  they  provide  similar  information.  Both  plots  draw  attention  to  race 
7,  but  they  differ  somewhat  in  their  emphasis  of  races  11  and  18. 

*  1.6     Brief  Literature  Review 

Local  influence  techniques  have  been  applied  to  a  number  of  statistical 
models,  most  notably  univariate  and  multivariate  linear  models  (Cook  1986;  Cook 
and  Weisberg  1991;  Lawrance  1991;  Schwarzmann  1991;  Tsai  and  Wu  1992;  Wu 
and  Luo  1993a;  Kim  1995;  Kim  1996c;  Tang  and  Fung  1996;  Fung  and  Tang  1997; 
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Figure  1.12:  Hills  data;  C^'s  for  variance  perturbation  and  Cook's  Distances 


Kim  1998;  Rancel  and  Sierra  1999).  A  number  of  articles  have  also  appeared  on 
applications  in  generalized  linear  models  (GLMs)  (Cook  1986;  Thomas  and  Cook 
1989;  Thomas  and  Cook  1990;  Thomas  1990;  Lee  and  Zhao  1997)  and  survival 
analysis  (Pettitt  and  Baud  1989;  Weissfeld  and  Schneider  1990;  Weissfeld  1990; 
Escobar  and  Meeker  1992;  Barlow  1997;  Brown  1997).  Other  applications  include 
linear  mixed  models  (Beckman  et  al.  1987;  Lesaffre  and  Verbeke  1998;  Lesaffre  et  al. 
1999),  principal  components  analysis  (Shi  1997),  factor  analysis  (Jung  et  al.  1997), 
discriminant  analysis  (Wang  et  al.  1996),  optimal  experimental  design  (Kim  1991), 
structural  equations  (Cadigan  1995;  Wang  and  Lee  1996),  growth  curve  models 
(Pan  et  al.  1996;  Pan  et  al.  1997),  spUne  smoothing  (Thomas  1991),  Box-Cox 
transformations  (Lawrance  1988;  Kim  1996b),  nonlinear  models  (St.  Laurent  and 
Cook  1993;  Wu  and  Wan  1994),  measurement  error  models  (Zhao  and  Lee  1995;  Lee 
and  Zhao  1996),  and  elliptical  linear  regression  models  (Galea  et  al.  1997). 

Many  authors  have  presented  modifications  of,  extentions  to,  and  theoretical 
justifications  for  the  basic  methodology  (Vos  1991;  Schall  and  Gonin  1991;  Schall 
and  Dunne  1992;  Farebrother  1992;  Wu  and  Luo  1993a;  Billor  and  Loynes  1993; 
Kim  1996a;  Fung  and  Kwan  1997;  Lu  et  al.  1997;  Cadigan  and  Farrell  1999;  Poon 
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and  Poon  1999).  Extentions  to  Bayesian  methodology  have  also  been  considered 
(McCulloch  1989;  Lavine  1992;  Pan  et  al.  1996). 

..'■■,"'  1.7    Outline 

In  Chapter  2,  computational  formulas  for  local  influence  analysis  are  discussed 
and  derived  for  the  two  perturbation  schemes  discussed  in  this  chapter.  Chapter  3 
contains  derivations  and  examples  of  local  influence  diagnostics  for  linear  regression 
under  perturbations  to  the  X  matrix.  In  Chapter  4,  the  methodology  is  extended  to 
a  general  class  of  influence  measures  and  estimation  techniques.  Computational  tools 
are  also  provided.  In  Chapter  5,  inference  and  local  influence  analysis  for  prediction 
problems  are  discussed.  Techniques  for  assessing  estimation  and  prediction  in  linear 
mixed  models  are  developed  and  applied  to  data.  In  Chapter  6,  some  preliminary 
results  that  can  be  applied  to  multi-stage  estimations  are  presented.  The  chapter 
also  contains  commentary  on  additional  areas  of  future  research.  The  appendices 
contain  details  for  data  analyses. 


CHAPTER  2 
COMPUTATIONAL  ISSUES  IN  LOCAL  INFLUENCE  ANALYSIS 

In  this  chapter,  computational  formulas  for  local  influence  are  reviewed 

(Cook  1986;  Beckman  et  al.  1987),  To  demonstrate  the  mechanics  of  the  formulas, 

detailed  derivations  of  previous  results  for  the  regression  model  (Cook  and  Weisberg 

1991;  Schwarzmann  1991)  will  be  presented  with  minor  additions.  Next,  the  issue 

of  how  to  obtain  objective  benchmarks  is  considered,  and  some  new  novel  solutions 

are  outlined.  Finally,  local  influence  is  discussed  under  reparamaterizations  of  the 

perturbation  space  ft  and  the  parameter  space  0.  By  applying  previous  results 

(Escobar  and  Meeker  1992;  Pan  et  al.  1997),  some  new  diagnostic  tools  for  influence 

on  fitted  values  in  linear  regression  are  derived. 

2.1     Curvatures  and  the  Direction  of  Maximum  Curvature 
Cook  (1986)  showed  that  the  curvature  of  LD{u})  in  direction  v  can  be 
expressed  as 

C^  =  2v'A'{-L~^)Av,  (2.1) 

where  —L  is  the  observed  information  matrix  of  the  original  model, 


-L 


deae' 


e^e 


and  Apx,  has  elements 


_d'ue-y) 


'^         dOidui 


O=0,u:=u}o 


for  i  =  1, . . .  ,p  and  j  =  1, . . .  ,q.  This  result  is  proved  in  Chapter  4  as  a  special  case 
of  Theorem  4.2. 
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The  qy~  q  matrix  F  =  2A'{-L     )A  is  the  acceleration  matrix.  It  is  easily- 
obtained  because  A  is  a  function  of  L^id;  y),  and  -L      is  typically  computed 
as  part  of  ML  estimation  since  it  serves  as  an  estimated  asymptotic  covariance 
matrix  for  the  parameter  estimates.  Hence,  computing  the  curvature  in  a  particular 
direction  is  straightforward.  In  addition,  the  diagonal  elements  of  the  acceleration 
matrix  are  the  q  basic  curvatures.   Finally,  it  is  well  known  how  to  obtain  the 
maximum  value  of  a  quadratic  form  such  as  (2.1)  (Johnson  and  Wichern  1992, 
pp.    66-68).   Specifically,  C^ax  is  the  largest  eigenvalue  of  F,  and  Umax  is  the 
corresponding  eigenvector.  Hence,  finding  Cmax  and  Umax  is  numerically  feasible. 

The  acceleration  matrix  can  provide  additional  information.  For  example,  the 
second  largest  eigenvalue  is  the  largest  curvature  in  a  direction  that  is  orthogonal 
to  Umax-  Hence,  its  corresponding  eigenvector  indicates  a  direction  of  large  local 
influence  that  is  unrelated  to  the  first  one.   Additional  elements  of  the  spectral 
decomposition  of  F  can  be  used  in  a  similar  fashion. 

A  consequence  of  this  derivation  is  that  it  shows  how  local  influence 
techniques  reduce  the  dimensionality  of  examining  LD{u})  via  a  small  number  of 
curvatures  and  directional  vectors.  It  is  well  known  that  the  number  of  non-zero 
eigenvalues  of  a  symmetric  matrix  such  as  F  is  equal  to  its  rank.   Since  it  is  a 
product  of  other  matrices,  the  rank  of  F  is  no  more  than  the  minimum  of  their 
ranks,  which  is  typically  p. 

In  the  regression  examples  given  previously,  p  =  3  and  g  =  35,  meaning  that 
the  local  influence  approach  can  summarize  the  35-dimensional  surface  LD{uj)  with 
three  curvatures  and  directional  vectors.  Similarly,  LD^^^uj)  was  summarized  with 
just  two,  as  shown  in  Figures  1.12(c)  and  (d). 

Despite  the  advantages  of  the  eigen-analysis  of  F,  one  difficulty  that  arises 
is  the  occurence  of  eigenvalues  with  multiplicities  greater  than  one.  For  example, 
if  C'max  =  Ai  has  multiplicity  greater  than  1,  then  there  is  no  unique  Vmax  to  plot. 
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In  these  situations  it  may  be  more  prudent  to  look  at  directed  influence  on  single 
parameter  estimates. 

2.2     Local  Influence  Analysis  of  a  Profile  Likelihood  Displacement 
The  profile  likelihood  displacement,  LD^^^\u}),  was  presented  in  Section 
1.2  as  a  measure  of  influence  directed  through  Oi.  Cook  (1986)  showed  that  the 
acceleration  matrix  can  be  expressed  as 


f''^  =  -2A'(£''  -  B22)A, 


(2.2) 


where 


-B22  — 

0 

0 

0 

^22 

and  L22  comes  from  the  partition 

•'^■>V''^  -    ■■'■■■'  ■■..  . 

J 

J 

^11 

L12 
L22 

w 


where 


d'L{e;y) 


''      dOM. 


9=0 


This  result  is  proved  in  Chapter  4  as  a  special  case  of  Theorem  4.2. 

Beckman  et  al.  (1987)  considered  directing  influence  on  a  single  parameter 
estimate.  They  noted  that  in  this  case,  the  acceleration  matrix  is  the  outer  product 
of  a  vector,  and  hence  has  only  one  non-zero  eigenvalue.  Without  loss  of  generality, 
suppose  that  we  wish  to  direct  influence  on  the  first  parameter  estimate  in  0.  The 
eigensystem  is  then  given  by 


A  =  2||A'T| 


V  oc  A'T, 


(2.3) 
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where  T  is  the  vector  for  which  TT'  =  {L     -  Baa)-  It  can  be  expressed  as 


(2.4) 


where  c  =  Ln  —  L12L22  L21. 

2.3     Building  Blocks  of  Local  Influence  for  Regression 
In  this  section,  the  building  blocks  of  F  for  a  multiple  linear  regression  are 
constructed.  Subsequent  sections  apply  these  results  to  the  response  and  variance 
perturbations  used  in  Chapter  1. 

2.3.1     Notation 

•  Unless  otherwise  noted,  the  derivations  assume  the  presence  of  an  inter- 
cept in  the  model  so  that  the  residuals  sum  to  zero.  Centering  and  scaling 
the  regressors  tends  to  faciliate  interpretation,  but  is  not  necessary. 

•  The  number  of  regressors  is  k  —  p  —  1. 

•  H  =  X{X'X)-^X. 

•  Xi  represents  the  i*^  column  of  X,  and  x\  represents  the  i*''  row  of  X. 

•  r  =  (/„  —  H)y  represents  the  residuals. 

•  Tj  represents  the  residuals  from  fitting  a  model  to  the  i*''  predictor  using 
the  other  predictors.  These  can  be  considered  as  values  of  an  adjusted 
predictor  (i.e.  adjusted  for  relationships  with  other  predictors).  Note: 
Hvi  =  Ti  and  (/  —  H)ri  =  0. 

•  X(j)  represents  the  X  matrix  with  the  i^^  column  removed.  Similarly, 
/S/^)  is  the  regression  parameter  vector  with  the  i*^  element  removed,  and 
A(i)  is  A  without  the  i"*  row. 


'"<.    %     * 


-.r-l 
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2.3.2     Acceleration  Matrix 

The  ML  estimates  of  0  are  the  same  as  those  of  ordinary  least  squares, 
3  =  {X'Xy^X'y  ,  and  the  ML  estimate  of  a^  is  ^.   The  p  x  p  estimated 
asymptotic  variance-covariance  matrix  of  the  MLEs  is  '  > 


L     = 


a2  {x'xy^    0 

0' 
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n 


Partitioning  A'  into  [A'^     A'^2],  the  acceleration  matrix  is  given  by 


F  =  2A'(-£     )A 


Aa' 


=  2a^A'^  {XX'y  Ap  + K,A,2 

lb 


2.3.3    Acceleration  Matrix  for  Directed  Influence  on  a^      -  •  • 

The  acceleration  matrix  for  directed  influence  on  a"^  will  utilize  the  matrix 
L      —  B22)  which  is  given  by 


0        0 


/  2,7'' 


0 


This  implies  that  the  acceleration  matrix  for  directed  influence  on  cr  is 

F^"'^  = -2A'(X"' -  B22)A 

=  — A;2A^2. 
n 


(2.5) 


Noting  that  A'^2  is  in  fact  a  9  x  1  vector,  there  will  be  a  single  eigenvalue  and 
eigenvector  given  by 


\        _    4ff''||    A' 


vi  oc  A'^2 


These  quantities  are  the  maximum  curvature  and  its  direction,  respectively. 
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2.3.4     Acceleration  Matrix  for  Directed  Influence  on  /3 
When  directing  influence  on  /3,  we  have 


L     —  B22  — 


-a^xx'y^  0 

C  0 


The  resulting  acceleration  matrix  is 

f^^=2a'A'f,{XX')-'Ap.  •      (2.6) 

Note  that  this  implies  that  directed  local  influence  on  ^  is  equivalent  to  local 
influence  for  /9  when  treating  a^  as  known  and  equal  to  its  MLE. 

2.3.5     Acceleration  Matrix  for  Directed  Influence  on  a  Single  Regression  Parameter 
By  applying  the  general  results  of  Beckman  et  al.  (1987),  we  can  construct  T 
(2.4)  for  directed  influence  on  $1  in  a  regression  model.  Using  results  for  the  inverse 
of  a  partitioned  matrix,  TT'  =  -{L     -  B22)  can  be  expressed  as 


-(X(i)X(i))     X(i)Xi 
0 


-XiX(i)(Xji)X(i))  ^  0 


(2.7) 


where  d  =  X'lXi  -  XiX(i)(X'(i)X(i))   ^X'^^^Xi. 

We  denote  {X'/^-.X (^i))~^ X'^^^^X i  by  7^,  as  it  is  the  regression  parameter 
estimates  from  modeling  Xi  as  a  linear  combination  of  the  other  explanatory 
variables.  This  simplifies  the  notation  of  (2.7)  greatly: 


ir' -  B22)  =  TT^, 

Fi 


1 

-t'i     0 

7i 

7i7i    0 

0 

0'       0 

(2.8) 
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Using  these  results,  we  find 


Fil 


1 

-7i 
0 


(2.9) 


and  this  leads  to  the  following  expressions  for  the  maximum  curvature  and  its 
direction  when  directing  influence  on  /5i : 

C^ax  =  2||A'TI|  =  j^\\A'^,  -  A^^^^Till         t^max  oc  A'T  a  A^^  -  A^^^^t^. 

The  next  two  sections  apply  these  results  to  document  more  completely  the 
two  perturbation  schemes  given  in  Chapter  1. 

2.4    Perturbations  to  the  Response  in  Regression 
The  results  of  this  section  closely  follow  those  of  Schwarzmann  (1991), 
with  only  minor  additions.  As  in  Chapter  1,  a  set  of  additive  perturbations  are 
introduced  into  the  observed  data  to  examine  the  influence  of  measurement  error  in 
the  response.  An  n  x  1  vector  of  perturbations  is  introduced  such  that  the  perturbed 
likelihood  is  based  on  Y  +  a;  ~  N{X/3,  a^In)-  The  null  perturbation  is  Wq  =  0„. 

Equivalent  results  can  be  obtained  by  perturbing  the  model  of  the  mean  to  be 
X/3  —  (jj.  That  is,  by  considering  a  model  assuming  Y  ~  N{XP  —  co,  (T^In)-  Hence, 
the  measurement  error  perturbation  can  be  interpreted  as  a  type  of  mean-shift 
outlier  model;  each  perturbation  is  a  "stand-in"  for  an  additional  parameter  that 
could  correct  or  exacerbate  lack-of-fit. 
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2.4.1     Building  Blocks 

The  perturbed  likelihood  is 

T?  1 

K{e;y)oc--\oga'-^{y-X0  +  ujy{y-Xl3  +  uj) 

n  1 

= -- loga^  -  — (y  +  u>)'(y  +  w) 

n  1 

= -- loga^  -  ^(y'y  +  2u;'y  +  w'a;), 

where  y  =  y  -  X^.  The  next  step  is  construction  of  A.  Partial  derivatives  with 
respect  to  a>  are  as  follows:  ., 

dK{0;y)  1 


du: 


2(t2 
1 


(2y  +  2u;) 


1 


(y  +  w) 


--(y-X/3  +  a;). 


Second  partial  derivatives  are  then 

d'L^{0;y)       1 


X 


%l 


Evaluating  at  the  original  model  yields 

d'L^{e;y) 


du:dl3' 
d'L^e^y) 


1 


0=0,U}=OJo         ^ 


divda^ 

From  these  quantities,  the  n  x  p  matrix  A'  can  be  constructed: 

1 


A'  = 


X     4^r 


^  '  --  V 
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Finally,  the  acceleration  matrix  is 


~    '     1 
F  =  2— 


X    I 


<t2  (XX')"'      0 
0' 


2£l 
n 


^2 


X' 

1  ».' 


(2.10) 


2  4 


:^2 


2.4.2     Directed  Influence  on  a 

The  acceleration  matrix  for  the  profile  likelihood  displacement  for  a^  is 
simply  the  second  term  of  (2.10): 


=  — ttTV  . 


na^ 


(2.11) 


Solving  the  characteristic  equations  {F"    -  XI)v  =  0  directly,  the  single 
eigensolution  is 


C/tnax  —  X 


max  —  '^max         a-2 


^max  0^   •  • 


Therefore,  as  mentioned  in  Chapter  1,  Umax  for  directed  influence  on  a'^  is 
proportional  to  the  residuals, 

2.4.3    Directed  Influence  on  ^j 

Without  loss  of  generality,  consider  directed  influence  on  $i.  Rather  than 
find  the  acceleration  matrix,  we  proceed  directly  to  determining  the  sole  eigenvector 
proportional  to  A'T  and  the  corresponding  curvature  2||A'T|p,  as  per  (2.3)  and 


i"  r-' — -  ■e?^> 
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(2.9): 


-X"!      X(i) 


ri\ 


1 
-7i 


=  ^  [^1  -  ^(l)"^!] 


a||ri 
1 

•^Ikil 
a  ri 


rn 


and 


a 


o"  Fil 


r^i 


Therefore,  the  direction  of  maximum  curvature  for  directed  influence  on  $j  is 

proportional  to  the  f^  adjusted  predictor. 

2.4.3.1     Directed  influence  on  /3  • 

The  acceleration  matrix  for  the  profile  likelihood  displacement  for  $  is  the 
first  term  of  (2.10): 


M 


^H. 


(2.12) 


This  matrix  has  a  single  non-zero  eigenvalue  of  ^  with  multiplicity  k,  implying  that 
there  are  an  infinite  number  of  directional  vectors  that  have  maximum  curvature. 
Although  not  discussed  by  Schwarzmann  (1991),  one  set  of  orthogonal  solutions  can 
be  motivated  by  considering  the  canonical  form  of  the  model: 


Y  ^Zoc  +  e, 


(2.13) 
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where 


Z  =  Xifi 

a  =  <fi'l3 

ip  = 

<Pi    ■ 

•    fk 

and 
(p-=  the  i*'*  eigenvector  of  X'X.        '        >. 

Because  (pep'  =  Ik,  this  is  simply  a  reparameterization  of  the  original  regression 
model.  The  new  regressors,  Zi  =  Xip^,  ...,Zk  =  Xp^.,  are  known  as  the  principal 
components,  and  they  are  mutually  orthogonal.   In  addition,  each  satisfies  the 
characteristic  equations  oi  F    :  ' 


{f'^-P)z.  =  ^Ah-i)x^, 


2 

^'  (2.14) 

=  0. 


Therefore,  the  standardized  principal  component  regressors  can  serve  as  an 
orthogonal  set  of  directional  vectors,  each  of  which  has  maximum  curvature.  Note 
also  that  the  k  adjusted  regressors,  ri, . . . ,  rjt,  are  also  directions  of  majcimum 
curvature,  but  they  do  not  constitute  an  orthogonal  set. 
2.4.3.2     Influence  on  9 

The  acceleration  matrix  F  given  in  (2.10)  has  two  non-zero  eigenvalues:  -^ 
and  j2,  with  the  second  one  having  multiplicity  k.  The  first  eigenvector  is  a  r, 
and  the  set  of  k  principal  components  can  be  used  to  complete  an  orthogonal  set  of 
eigenvectors.  Comfirmation  of  these  solutions  is  straightforward. 

■     '         2.5     Unequal  Measurement  Precision  Applied  to  Regression 

Cook  introduced  this  perturbation  in  his  original  (1986)  paper  on  local 
influence,  and  the  results  in  this  section  come  from  his  work.  An  n  x  1  vector 
of  perturbations  is  introduced  such  that  the  perturbed  likelihood  is  based  upon 
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Y  ~  N{X0,a'^D{l/u})),  where  D{l/u:)  denotes  a  diagonal  matrix  with  i*''  element 
^.  The  null  perturbation  is  a?o  =  In-  The  n  basic  perturbations  would  be  the  local 
versions  of  the  n  Cook's  distances  (Cook  1977;  Cook  1979).  In  addition,  the  n  basic 
perturbations  applied  to  LD^^^u)  are  local  influence  analogs  of  the  n  DFBETAS 

for  ^j. 

A  second  interpretation  of  this  perturbation  scheme  comes  from  the  idea 
of  case- weights.  Case- weights  are  used  in  an  analysis  when  each  observation  has 
been  sampled  with  unequal  probability.  Under  disproportionate  sampling,  cases  are 
weighted  by  the  inverse  of  their  sampling  probability  in  order  to  obtain  unbiased 
regression  parameter  estimates.   In  the  normal  linear  model,  this  weighting  is 
numerically  equivalent  to  perturbing  the  error  variances.  Hence,  this  perturbation 
scheme  can  be  used  to  examine  sensitivity  to  the  assumption  of  random  sampling. 

2.5.1     Building  Blocks 

To  begin,  we  develop  the  perturbed  likelihood,  which  is 

■n  1 

U0;y)  oc  ~^\oga'  -  — (y  -  X^)'D(u;)(j/  -  Xfi) 

=  -^  log a^  -  —y'D{uj)if. 

The  next  step  is  construction  of  A.  Partial  derivatives  with  respect  to  /3  are  as 
follows: 

^^"j^'  ^^  =  -^(-2X'D(u;)y  +  2X'D{u:)Xfi) 

=  ^X'D{u:)y, 


<t2' 


and  the  partial  derivative  with  respect  to  a   is 


dL^{0\y)  n  1    ,, 
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Second  partial  derivatives  are  then 


and  evaluating  at  the  original  model  yields: 


d'LUe;y) 


dl3du}' 


1 


da^doj' 


2^4^^'^^        2^4  ^*«' 


0=9,uj=u)o 

where  •  is  the  denotes  the  Hadamard  (element- wise)  product,  implying  that  r,,  is 
an  n  X  1  vector  with  elements  rf.  Combining  these  we  can  write 

1 


A'  = 


D{r)X    ^r 


2ct2  '  sq 


and 


■  F^2a^ 


^D{r)X 


+ 


4a' 


n 


(XX')" 


T^X'Dir) 


(2.15) 


a2 


D{r)HD{r)  + 


2na-'   ''"^ 


2.5.2     Directed  Influence  on  a"^ 

The  acceleration  matrix  for  the  profile  likelihood  displacement  for  a^  is 
simply  the  second  term  of  (2.15): 


Solving  the  characteristic  equations  directly,  the  single  eigensolution  is 


(2.16) 


"^max  ^  ^ sq 
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This  suggests  that  o^  is  most  affected  by  variance  perturbations  when  they  are 
scaled  according  to  the  squared  residuals. 

2.5.3    Directed  influence  on  yg,- 

Without  loss  of  generality,  consider  directed  influence  on  ^i.  Again  we  will 
proceed  directly  to  determining  the  sole  eigenvector  proportional  to  A'T  and  the 
corresponding  curvature  2||A'T|p  as  per  (2.3)  and  (2.9): 


Xi    X(i) 

a 

1 
-7i 

a  T\ 

= r>(r)ri 

(Xr  -Ty 

(2.17) 


and 


t.'max  —  •^ 


1  -n       ^     112  0„     11^"  ''I 

-r  •  Ti      =  zn 


^  n 


r||2||ri||2- 


(2.18) 


This  suggests  that  maximum  local  influence  of  the  perturbation  on  $i  is 
achieved  by  perturbing  cases  that  have  both  large  residuals  and  extreme  values  for 
the  adjusted  covariate.  Cook  (1986)  provided  additional  details  of  how  this  result 
compares  to  the  information  in  a  detrended  added- variable  plot. 

Also,  we  may  compare  the  information  in  Umax  with  that  provided  by  the 
DEBET  AS  diagnostic  (Belsley  et  al.  1980).  Recall  that  the  value  of  DEBETASij 
is  the  standardized  diflference  between  $j  and  /3j,(i),  the  estimate  when  deleting  the 
i*''  observation.  The  formula  is  given  by 


DEBET  ASij  = 


^ij'i 


^Jv^) 


s\\cj\\{\-hii) 
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where  r^  is  the  i*''  residual,  Cj  is  the  f"  column  of  X{X'X)-\  Cij  is  the  i^^  element 
of  Cj,  and  s^  =  ^^^  is  the  mean  squared-error  estimate  of  cr^.  The  vector  Cj  can  be 
written  in  terms  of  the  /''  adjusted  regressor  by  partitioning  (X'X)"^  as  per  (2.7). 
For  i  =  1, 


Ci  = 


Xi    X 


(1) 


ri 


1 

-7i 


'•i 


jri. 


Using  this  equality,  the  vector  of  n  DEBET  AS  for  /3j  is  proportional  to 


/ 


1 


\ 


i-ftii 
1 


i-ft 
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r  •  Ti 


(2.19) 


\  i-/i„„  / 
Since  Umax  oc  r  •  rj,  observations  with  large  leverage  stand  out  more  in  a  plot  of  the 
n  DEBET  AS  for  /3j  than  in  a  plot  of  Vmax-  This  algebraic  relationship  between 
Vmax  and  DEBET  AS  has  not  been  noted  by  previous  authors. 

2.5.4     Directed  influence  on  ^ 

The  acceleration  matrix  for  directed  influence  on  y3  is  the  first  term  of  (2.15): 

2 


F^^  = 


[D{r)HD{r)]. 


The  eigensystem  of  the  matrix  does  not  have  a  closed-form.  However,  some  idea  of 
the  directed  influence  of  this  perturbation  on  ^  can  be  ascertained  by  utilizing  basic 
perturbations.  The  i^^  basic  perturbation  corresponds  to  modifying  the  variance  of 
only  the  i"*  case,  and  its  curvature  is  the  i^^  diagonal  element  of  F     (Cook  1986): 

2 


(2.20) 
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(b)  vz  for  0 


Figure  2.1:  Hills  data;  variance  perturbation 


This  is  similar  to  the  i*^  Cook's  Distance  (Cook  1977): 

.      •    V  1 


k{\  -  hiifs 


2.5.5    Influence  on  0 


The  eigensystem  of  F  is  not  closed- form,  but  it  can  be  found  numerically. 
As  an  example,  consider  the  perturbation  of  the  error  variances  for  the  Scottish 
hills  data.  There  are  three  non-zero  eigenvalues  for  F:  14.82,  4.91,  and  0.37.  The 
eigenvector  associated  with  the  largest  eigenvalue  was  plotted  in  Figure  1.10(a).  It 
is  similar  to  v^aax  for  directed  influence  on  o-^,  as  given  in  Figure  1.10(b).  The  other 
two  eigenvectors  for  F  are  plotted  in  Figure  2.1.  :   " 

2.6     Benchmarks  for  Perturbation  Analyses     -• 
An  analyst  using  local  influence  diagnostics  will  want  to  know  when  a 
curvature  is  "too  big"  and  when  an  element  of  Umax  is  "too  big."  Calibration  of  local 
influence  diagnostics  was  the  topic  of  much  debate  in  the  discussion  and  rejoinder 
to  Cook  (1986),  and  there  is  still  no  consensus  opinion  on  the  subject.  Much  of  the 
difficulty  lies  in  the  fact  that  the  perturbation  is  introduced  into  the  problem  by  the 
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analyst.  Because  the  technique  does  not  involve  hypothesis  testing,  critical  values 
are  not  appropriate.  However,  there  have  been  attempts  to  develop  rule-of-thumb 
benchmarks,  as  exist  for  Cook's  Distance  and  other  common  influence  diagnostics. 

In  this  section,  we  discuss  some  general  difficulties  in  developing  benchmarks 
for  perturbation  analyses,  as  well  as  some  issues  specific  to  the  local  influence 
approach.   Although  it  is  argued  that  all  sensitivity  is  "relative",  some  simple 
methods  for  assessing  the  size  of  curvatures  and  the  size  of  the  elements  of  directional 
vectors  are  presented.       ,  ,, 

2.6.1  The  Analyst's  Dilemma        .      ,    ■,,      '* 

As  mentioned  before,  the  perturbation  is  introduced  by  the  analyst,  and  this 
implies  that  he/she  must  decide  whether  the  effects  of  perturbation  are  important, 
much  the  way  a  researcher  decides  what  effect  size  is  important.  Consider  a  situation 
in  which  the  researcher  is  actually  able  to  give  the  size  or  stochastic  behavior  of 
the  perturbation.   Here,  we  may  concretely  measure  the  effects  of  perturbation 
by  whether  or  not  the  conclusion  of  a  hypothesis  test  changes.  Alternatively,  the 
effects  can  be  measured  by  whether  or  not  the  perturbed  estimates  are  outside  an 
unperturbed  confidence  ellipsoid.  However,  despite  having  a  concrete  measure  of  the 
perturbations's  effects,  assessing  the  practical  relevance  of  the  results  still  falls  to 
the  analyst. 

2.6.2  Calibrating  Curvatures  in  Local  Influence 

In  addition  to  the  general  limitations  of  a  perturbation  analysis,  an  analyst 
using  the  local  influence  approach  must  also  contend  with  the  fact  that  the  behavior 
and  size  of  the  perturbations  are  undetermined  apart  from  the  basic  formulation. 
Loynes  (1986)  and  Beckman  et  al.  (1987)  noted  that  curvatures  do  not  have 
the  invariance  properties  needed  for  a  catch-all  method  of  benchmarking.  This 
necessitates  a  reliance  upon  relative  influence. 
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2.6.2.1     1-to-l  coordinate  changes  in  fl 

Following  Loynes  (1986)  and  Beckman  et  al.  (1987),  let  us  consider  1-to-l 
coordinate  changes  in  the  perturbation  space  ft.  That  is,  a  new  perturbation  scheme 
is  constructed  with  elements  u*  =  k{LOi)  ior  i  =  1, ...  q,  for  a  smooth  function  k.  The 
new  scheme's  perturbation,  a;*,  will  not  yield  the  same  curvature  as  uj.  Specifically, 


Coi*  =  (^)  _     C^y  where  uiig  denotes  the  null  perturbation  value  for  a;, 


^duii 


U=Ui, 


As  an  example,  consider  perturbing  the  variance  of  j/j  to  be  5?  rather  than  rr- 

i  • 

Here,  k{uji)  =  uof,  resulting  in  new  curvatures  that  are  (^l^i^i)^  =  (2a;i)^.^i  =  4 
times  the  original  ones.   Hence,  despite  the  fact  that  the  two  versions  of  fl  can 
produce  the  same  set  of  perturbed  models,  the  curvature  in  identical  directions  will 
change.   This  implies  that  a  universal  benchmark  for  curvatures  is  not  possible. 
Indeed,  given  the  arbitrariness  of  scaling  the  perturbation,  it  should  come  £is  no 
surprise  that  curvatures  cannot  be  compared  across  perturbation  schemes.  However, 
the  direction  of  maximum  curvature  is  unchanged  under  1-to-l  coordinate  changes 
in  fl.  '    ■    ,■ 

2.6.2.2     Interpreting  curvatures         •    % 

In  this  section,  a  simple  way  to  assess  curvatures  from  the  same  perturbation 
scheme  is  given.   First,  yC  is  a  Taylor  series  approximation  to  LD{u)q  +  av) 
(Lawrance  1991;  Escobar  and  Meeker  1992).  Also,  assuming  that  n  is  large,  an 
asymptotic  100(1  —  q)%  likelihood-based  confidence  region  for  0  is  given  by  values 
oi  0  such  that 

2{L{9-y)-L{9-y))<xl,-^. 

Thus,  if  twice  the  curvature  in  direction  v  is  larger  than  Xp.i-o?  then  a  perturbation 
at  a  distance  of  one  from  Wq  in  direction  v  moves  the  parameter  estimates  to  the 
edge  of  the  asymptotic  confidence  region. 
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To  further  develop  this  idea,  let  us  define  the  size  of  a  perturbation  to  be 
its  Euclidean  distance  from  the  null  perturbation  (i.e.,  ||aj  —  ci>o||).  Suppose  further 
that  we  have  perturbations  in  two  different  directions:  u;^  in  direction  va  and  ujb 
in  direction  ug.  If  |C^|  is  fc  times  larger  than  \Cb\,  then  perturbation  wg  must  be 
s/k  times  larger  than  u;^  to  have  the  same  effect  on  the  likelihood  displacement. 
Equivalently,  in  order  to  perturb  the  estimates  to  the  edge  of  the  confidence  ellipsoid, 
it  would  be  necessary  to  travel  \/k  times  farther  in  direction  vg  in  ft  as  in  direction 
v^.  This  provides  a  simple  interpretation  for  the  relative  size  of  two  curvatures. 

2.6.2.3  Internal  benchmarks  for  curvatures 

Building  on  the  interpretation  given  above,  curvatures  can  be  benchmarked 
based  on  their  relative  size  to  some  sort  of  "average"  curvature  for  a  given  scheme 
and  fitted  model.  For  instance,  the  average  of  the  q  basic  curvatures  for  the  fitted 
model  can  serve  as  a  benchmark.   Alternatively,  the  average  curvature  could  be 
computed  by  (1)  allowing  the  squared  elements  of  v  to  have  a  Dirichlet(ai . .  .ag) 
distribution  with  all  oij  equal  to  one,  and  then  (2)  finding  the  expected  curvature. 
This  would  be  equivalent  to  finding  the  average  approximate  displacement  at  a 
radius  of  \/2  from  Wq  (Cook  1986).  If  such  a  benchmark  is  difficult  to  compute,  a 
Monte  Carlo  estimate  can  be  calculated.  Because  these  benchmarks  are  computed 
using  curvatures  from  the  fitted  model,  they  result  in  comparisons  that  are  internal 
to  the  fitted  model.  The  conformal  normal  curvature  of  Poon  and  Poon  (1999) 
would  also  be  an  internal  way  to  assess  the  sizes  of  curvatures. 

2 .6.2 .4  External  benchmarks  for  curvatures 

It  is  entirely  possible  that  all  of  the  curvatures  of  a  fitted  model  are  large 
due  to  the  nature  of  the  data  set,  and  using  internal  benchmarks  will  not  alert 
the  researcher  to  such  a  phenomenon.  To  address  this,  a  benchmark  that  is  not 
dependent  on  the  observed  data  is  needed.    To  this  end,  we  might  utilize  the 
expected  value  of  the  curvature  in  direction  v,  using  0  in  place  of  the  unknown 
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parameters  0.  The  expected  curvatures  can  then  be  plotted  along  with  the  observed 
curvatures  (Cadigan  and  Farrell  1999).  If  the  expectation  of  Cmax  is  difficult  to 
obtain,  Monte  Carlo  simulations  could  be  used  to  estimate  it.  As  an  alternative, 
Schall  and  Dunne  (1992)  developed  a  means  of  external  benchmarking  by  developing 
a  scaled  curvature  using  results  on  parameter  orthogonality  (Cox  and  Reid  1987). 
2.6.2.5     Internal  versus  external  benchmarks 

Internal  and  external  benchmarks  are  complimentary  tools,  since  each  one 
provides  different  diagnostic  information.  Comparing  curvatures  from  the  same 
fitted  model  highlights  relative  sensitivity  given  the  data  and  the  design.  Comparing 
curvatures  to  their  expectations  highlights  whether  the  sensitivities  are  expected 
given  the  experimental  or  sampling  design. 

2.6.3    Calibration  of  Directional  Vectors 

For  a  directional  vector  from  the  eigensystem  of  F,  the  analyst  must  compare 
the  individual  elements  in  the  vector  to  determine  which  aspects  of  the  perturbation 
are  most  important.  Typically,  it  is  suggested  that  the  analyst  simply  look  for 
elements  that  stand  out  as  being  large  relative  to  the  rest.  Alternately,  a  vector 
whose  elements  are  all  equal  in  size  can  serve  as  a  means  of  comparison.  That  is  to 
say,  if  each  of  the  q  elements  of  the  perturbation  are  contributing  equally,  then  each 
is  equal  to  either  -i=  or  — -^.  A  particular  element  whose  absolute  value  is  larger 
than,  say,  2  or  3  times  that  of  equally  contributing  elements  (e.c.e.)  would  then  be 
of  interest.  Hence,  the  inclusion  of  horizontal  lines  at  ±-^  and  ±-^  in  the  plot  of 
directional  vectors  may  be  useful. 

■  2.7    Local  Assessment  Under  Reparameterizations  ' 

Pan  et  al.  (1997)  show  that  Vmax  is  invariant  to  any  1-to-l  measureable 
transformation  of  the  parameter  space  0.    In  fact,  for  0  =  h{(f>),  the  entire 
eigensystem  of  F^  is  the  same  as  that  of  JFg.  However,  directed  influence  on  the 
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estimates  of  the  redefined  parameters  can  then  be  examined.  Using  similar  steps  to 
Pan  et  al.  (1997)  and  Escobar  and  Meeker  (1992),  we  can  write  update  formulas 
for  A^  and  L^    and  then  calculate  directed  influence  on  a  subset  of  the  redefined 
parameter  estimates.    Using  partial  diflFerentiation,  the  building  blocks  of  the 
acceleration  matrix  under  the  parameterization  </>  can  be  written: 


de'..  de    , 


(2.21) 


(2.22) 


where  all  partial  derivatives  are  evaluated  at  the  fitted  model.   From  this,  the 
acceleration  matrix  for  directed  influence  on  <p-^  can  be  obtained  from  the  partition 

</>  =  (</>i,</»2)': 


,[</>! 


-2A;(L 
,00 


-1 


^022  )^<^ 


=  -2A 


where 


dc/) 
=  -2A^(L,    -^B,,,-)A„ 


0       0 


(2.23) 


0    il^ 

?>22 


Zy^22  comes  from  the  partition 


-t'011    L 


4>12 


^4>2\        ^4>22 


and  all  partial  derivatives  are  again  evaluated  at  the  fitted  model. 

Although  Pan  et  al.  (1997)  suggested  that  (2.23)  is  invariant  to  the  choice 
of  027  they  did  not  provide  a  proof.    To  show  this  result,  consider  another 
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reparameterization  for  which  the  first  pi  elements  are  the  same  as  those  of  (f>: 


V'l 

01 

xj,= 

= 

V'2 

^2 

This  implies  that 


dtj)' 


Ipi      0 

d<f>2  9(p2 

dxl>[       9^2 


(2.24) 


and  this  leads  to  the  following  expression  for  the  lower  right  submatrix  of  L^: 


8x1^2      dii)'2 

d(f>'  89'  ■■    80  d(j> 

0 


av'2 


8e' ..   80 


8(f>    "50' 


0 

d4>2 
di>'2 


(2.25) 


_  d^dO^i    00  8(p2 

8'4)2d(j)2     ^8(1)2  811^2 

8x1,2    '^^^ 8xp'2' 
where  all  partial  derivatives  are  evaluated  at  the  original  model.  Meanwhile,  the 
acceleration  matrix  for  directed  influence  on  i/j^  is 


,[1/-! 


-2a;(l;  - 


=  -2a;(l, 


0       0 

)A, 

0    L-^ 

1p22 

80 

0       0 

8x1)' 

0    L 

-1 

022 

(2.26) 


5^ 
8x1) 


)Afi 


The  invariance  of  the  acceleration  matrix  for  directed  influence  on  </>i  to  the  choice 
of  02  is  proved  by  showing  that  (2.26)  is  equivalent  to  (2.23).  Using  the  inverse  of 
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(2.25), 
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0       0 
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Substituting  (2.27)  into  (2.26)  completes  the  result. 

2.7.1     Directed  Influence  on  a  Linear  Combination  of  Parameter  Estimates 

As  mentioned  in  Escobar  and  Meeker  (1992),  a  specific  application  of  these 
results  is  to  consider  directed  influence  on  a  linear  combination  of  parameter 
estimates,  x0.  There  are  many  ways  to  reparameterize  the  model  such  that  x'0  is 
one  of  the  new  parameters.  For  example,  assuming  that  the  first  element  of  x  is 
non-zero,  </>  can  be  defined  as  A0,  where 

xi      a;(i) 
Op_i    Ip-i 

where  Xi  and  a;(i)  are  the  first  and  remaining  elements  of  a;,  respectively  (Escobar 
and  Meeker  1992).   Using  equation  (2.23),  the  acceleration  matrix  for  directed 
influence  on  ^i  =  x'0  can  be  computed,  and  the  resulting  diagnostics  are  invariant 
to  the  choice  of  (/>2, . . . ,  0p.  By  using  a  prudent  choice  of  new  parameters,  we  can 
obtain  easy-to-compute  formulas  for  the  single  eigensolution  of  F      ,  as  per  the 
following  theorem. 


Theorem  2.1 

The  maximum  curvature  and  its  direction  for  directed  influence  on  x'O  is  given  by 

where  ki  =  Lg      x. 
Proof  of  Theorem  2.1 

••  1/2 

To  begin,  consider  a  model  with  parameterization  (j)  —  K'Lg    0,  where 


K 


is  a  full  rank  p  x  p  matrix  with 


Ki      K2      ■  ■  •      Kr 


ki  =      Lg        X 

>-  Kkj=      0  iy^j 

k[ki  =    k[ki        i  =  2. .  .p. 

By  construction, 

Using  the  update  formulas  in  (2.21)  and  (2.22)  it  follows  that 

A         9^'  A 


iifciir 


"''■W' 


45 


Next,  we  show  that  directed  influence  on  ^i  =  x'0  is  not  a  function  oi  k2,  ■ .  ■ ,  kp. 
From  equation  (2.23),  we  find 

\ 


f'''  =  2(, 


1 


-1/2 


Ifell 


A'.Le'  K)     ||fci||^J-||fex| 


0    0' 

0    I 
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'     :A'L-'^'k 


IfcllP      "     ' 


1     0' 
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Ifel 


(2.28) 


The  single  eigensolution  of  this  rank-one  matrix  yields  the  final  solution  for  Cmax 

and  Umax-        □     ,         ■ 

The  construction  of  0  is  worth  some  note.  The  premultiplication  of  0  by 

•■  1/2 

K'Lg     orthogonalizes  the  parameters,  in  the  sense  that  the  off-diagonal  elements 
of  the  observed  information  matrix  for  </)  are  zero.  (Note  that  the  orthogonalization 
is  with  respect  to  the  observed  information  matrix  as  in  Tawn  (1987),  rather 

••  1/2 

than  the  expected  information  as  in  Cox  and  Reid  (1987).)   First,  a  =  Lg    0 
is  a  parameterization  with  an  observed  information  matrix  equal  to  the  identity 
matrix.  Then,  K  is  constructed  to  make  orthogonal  contrasts  of  a,  with  the  first 
column  corresponding  to  the  linear  combination  of  interest.  Finally,  by  scaling  the 
columns  of  K  to  all  have  the  same  length  as  fei,  K  becomes  a  scalar  multiple  of 
an  orthonormal  matrix,  and  this  leads  to  the  algebraic  simplifications.  Also,  if 
x'6  =  9j  the  local  influence  diagnostics  provided  by  the  theorem  are  the  same  as 
those  provided  by  (2.3),  as  per  Beckman  et  al.  (1987). 
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2.7.2     Application  to  Regression 

As  an  example,  consider  perturbing  the  variances  of  a  regression  model  and 
examining  influence  on  a  linear  combination  of  the  regression  parameter  estimates, 
fi{x)  =  x'yS.  From  (2.28),  we  find 


^    D{r)X{X'X)-^xx'{X'Xy^X'D{r), 


(2.29) 


where  K  =  x'{X'X)~'^x  .  Further, 

v^^(xx'{X'X)-'^X'D{r)(x    Cov[X'fi,x'p)  ■  r  (2.30) 

.  C„,a,  =  4-|l^'(^'^)''^'^('-)ir.  -•     "        (2-31) 

where  Cov(X^,  x'yS)  is  the  vector  of  empirical  covariances  between  X0  and  x'/9. 
2.7.2.1     A  DFFITS-type  measure 

Using  the  variance  perturbation  and  directed  influence  on  x  0,  a  local 
influence  analog  of  DFFITS  can  be  defined.  Recall  that  DFFITSj  is  defined  as  the 
externally  studentized  change  in  the  i^^  fitted  value  when  deleting  the  i*'*  observation 
(Belsley  et  al.  1980): 

DFFITS.  =  ^^^1^^£^, 

where  fl{xi)[i]  is  the  estimate  of  //(xj)  with  the  i*^  observation  deleted  and  sh  is  the 
mean  squared-error  estimate  of  a^  with  the  i*''  observation  deleted.  A  large  value 
for  DFFITS  indicates  that  the  observation  is  somewhat  "self-predicting" .  It  can  be 
shown  that  DFFITS,  simplifies  to 
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A  similar  measure  can  be  obtained  by  examining  local  influence  on 
p,{Xi)  =  a;-3  when  using  the  i*^  basic  variance  perturbation.   That  is,  only  the 
i*'*  observation  is  reweighted,  and  the  curvature  for  directed  influence  on  /i(cc,)  is 
examined.  The  curvature  for  the  i*''  basic  perturbation  is  the  i*'*  diagonal  element  of 

Cft.  =  BasicFITSi  =  ^^.  (2.32) 

A  large  value  of  BasicFITS  indicates  that  the  perturbation  of  the  observation's 
precision/weighting  greatly  influences  its  fitted  value.  Thus  it  is  a  local  influence 
analog  of  DFFITS.  Coincidently,  BasicFITSj  is  equivalent  to  the  i^^  basic  curvature 
for  the  acceleration  matrix,  as  given  in  (2.20).  That  implies  that  for  the  variance 
perturbation,  the  n  basic  curvatures  of  LD{u>)  are  measuring  influence  on  the  n 
fitted  values. 

While  BasicFITSj  is  the  f*''  basic  curvature  when  directing  influence  on  //(xj), 
we  can  also  define  MaxFITSi  to  be  C^ax  for  directed  influence  on  fi{xi).   Using 
(2.31),  .  . 

2        " 
C:nax.  =  MaxFITSj  =  T^  J]  ^r-J.  (2.33) 

By  definition  the,  i*''  MaxFITS  can  be  no  smaller  than  the  i^^  BasicFITS.  While 
BasicFITS  provides  information  about  which  observation's  perturbation  influences 
its  fitted  value,  MaxFITS  provides  information  about  how  sensitive  each  fitted  value 
is  relative  to  the  others.  • 

Example 

Consider  examining  influence  on  the  fitted  values  for  each  of  the  35  covariate 
profiles  in  the  Scottish  hills  data  set.  Figure  2.2(a)  displays  the  35  BasicFITS  (i.e., 
each  of  the  basic  curvatures  for  directed  influence  on  the  corresponding  fi{xi))  . 
Race  7  is  indicated  as  being  influential  on  its  fitted  value,  while  race  18  is  also  shown 
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.:  ,         ..  (c)  BasicFITS  (stars  connected  by 

solid  line)  and  MaxFITS  (triangles 
connected  by  dashed  line) 

Figure  2.2:  Hills  data;  cr^D(l/u;)  perturbation;  influence  on  fitted  values 


to  be  quite  influential  on  its  fitted  value.  Plot  (b)  shows  the  35  MaxFITS  (i.e.  the 
C'max's  for  directed  influence  on  each  of  the  35  fi{xi)).  Since  no  point  stands  out 
as  being  large,  the  plot  indicates  that  no  fitted  value  is  particularly  sensitivety  to 
variance  perturbations.   Plot  (c)  shows  both  sets  of  diagnostics.   The  BasicFITS 
values  are  stars  joined  by  a  solid  line,  while  the  MaxFITS  values  are  triangles  joined 
by  a  dashed  line.  The  closeness  of  the  MaxFITS  and  BasicFITS  values  for  race  7 
indicates  that  this  observation  is  somewhat  self-predicting. 
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CHAPTERS 
PERTURBATIONS  TO  THE  X  MATRIX  IN  REGRESSION 

In  this  chapter,  the  influence  that  infinitesimal  errors  in  the  X  matrix  can 

have  on  parameter  estimates  in  a  linear  regression  is  addressed.   First,  previous 

results  for  single  column  and  single  row  perturbations  are  reviewed.  This  is  followed 

by  a  generalization  of  these  results  to  perturbing  all  elements  in  the  X  matrix.  It  is 

shown  that  the  eigensolution  of  F     is  related  to  principal  components  regression, 

collinearity,  and  outliers.  Finally,  new  graphical  techniques  are  presented  with  the 

results  applied  to  the  Scottish  hills  data  set  and  an  educational  outcomes  data  set. 

3.1     Perturbations  to  Explanatory  Variables 
Letting  k  denote  the  number  of  predictors.  Cook  (1986)  proposed  that  an 
nA;-sized  perturbation  can  be  incorporated  into  the  design  matrix  in  an  additive 
fashion.  These  perturbations  represent  fixed  errors  in  the  regressors  and  should 
not  be  confused  with  random  errors-in- variables  techniques  (Fuller  1987).   The 
perturbation  scheme  can  be  used  to  assess  the  effects  of  small  rounding  or  truncation 
errors  in  the  explanatory  variables.  Upon  perturbation,  the  X  matrix  becomes 


X  + 


UJi 

W„+l 

^2n+l      ■ 

•  •      ^kn-n+1 

UJ2 

^n+2 

... 

■  ■     ljJkn-n+2 

Us 

■  ■     ^kn-n+3 

OOn 

i^2n 

W3n 

■■      OJkn 

The  use  of  a  single  subscript  implies  w  =  vec(W^),  where  the  vec  operator  stacks  the 
columns  of  the  matrix  into  a  single  column.  When  particular  elements  of  W  greatly 
aflPect  parameter  estimation,  it  indicates  that  the  corresponding  element  of  X  is 
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'^-^C  50 

influential.  This  allows  the  analyst  to  identify  specific  combinations  of  explanatory 
variable  values  that  influence  estimation.  As  will  be  shown  in  Section  3.4,  sensitivity 
of  the  regression  parameter  estimates  to  these  kinds  of  perturbations  is  greater  when 
there  is  coUinearity  amongst  the  regressors  and  when  the  model  is  calibrated  to  fit 
the  observed  data  too  closely. 

3.2     Perturbations  to  a  Single  Column 
The  special  case  of  only  perturbing  a  single  column  of  X  was  discussed 
in  Cook  (1986),  Thomas  and  Cook  (1989),  and  Cook  and  Weisberg  (1991).  This 
section  recreates  their  results  in  detail. 

3.2.1     Building  Blocks 

Here  and  in  the  following  sections,  di  represents  a  vector  with  a  1  in  the  «'*'' 
position  and  zeros  elsewhere.  The  length  of  the  vector  depends  on  the  context  of  its 
use.  Note  that  for  di  of  size  k, 

d[{X'Xr'di  =  rr^  (3.1) 

1 1  '  1 1 1 


and 


X{X'X)-'di  =  r-^ri,  :'  (3.2) 

Ir  ill  .. 

where  r,  contains  the  residuals  from  regressing  Xi  on  the  other  columns  of  X. 

Without  loss  of  generality,  suppose  perturbations  are  added  to  the  n  values 

of  the  first  predictor  (i.e.,  Xi  becomes  Xi  +u}).  The  perturbed  likelihood  is 

n  I 

K{9;  y)  oc  --  loga^  -  --{y  -Xf3-  Aa;)'(y  -X/3-  ^u;) 

-  ^  ^^  (3.3) 

=  ~  loga^  -  —{y'y  -  2My  +  ^V^). 
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The  next  step  is  the  construction  of  A.  Partial  derivatives  with  respect  to  u)  are  as 
follows: 


Second  partial  derivatives  are  then 


d'LUe;y)  ^     A 
d'LUe;y)  1 


-(y-X(i)/3(i)-2A(A:i+a;)) 


X(i) 


duda^ 


—  (y-X0-/3,u:). 


Evaluating  at  the  original  model  yields 


..    du}d/3[ 
d^LU0;y) 


^{r-^X^) 


d'K{0;y) 


0=B,u:=u:o 


.  '  duda"^ 

and  from  these  quantities  we  construct  the  n  x  p  matrix 

1 


A'  = 


1 


(3.4) 


r  -  A^i         -A-X:(i) 
rd[  -  AX 
where  di  is  a  A;  x  1  vector.  It  follows  that  the  acceleration  matrix  is  given  by 


rd[-PiX    -kr 


a^  {XX'y'^      0 
0' 


Ml 

n 


dir'  -  Z^iX' 


-^r' 

/T'* 


(3.5) 


J-^mrr'.PlH-Jl^(rr',^ry) 
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3.2.2     Directed  Influence  on  a^ 


The  acceleration  matrix  for  the  profile  likelihood  displacement  for  a^  is  given 


by 


n 

=  — jrr  . 


(3.6) 
(3.7) 


Solving  the  characteristic  equations  {F       -  XI)v  =  0  directly,  the  single 
eigensolution  is 


C„ 


4/3? 


l^max  0^  '  • 


Therefore  the  direction  of  maximum  curvature  for  directed  influence  on  a'^  is  always 
proportional  to  the  residuals  regardless  of  which  column  of  X  is  perturbed,  but  the 
corresponding  curvature  is  a  dependent  upon  the  effect  size,  \$i\. 

3.2.3     Directed  Influence  on  /9i 

Using  the  results  of  Beckman  et  al.  (1987),  the  sole  eigenvector  is  proportional 
to  A'T  and  the  corresponding  curvature  is  2||A'T|p  as  per  (2.3): 


'^max  CX    .  - 


r-AXi    -AX 


(1) 


n 


1 

-7i 


1 


(^\\ri\ 
1 


cr||ri 
oc  r  —  /3iri 


r-/3iXi+/5i7iXi 
r  -  /3iri 


,    (3.8) 
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and 


'^max  — 


mr,\\^ 


*9II      21 


=  l[^'t^ 


(3.9) 


o^  \  \\ri\\ 

The  direction  of  maximum  curvature  is  a  weighted  average  of  the  residuals  and  the 
adjusted  regressor.  As  noted  in  Cook  and  Weisberg  (1991),  ^iVi  —  y  —  y^i),  where 
y(i)  is  the  vector  of  fitted  values  when  Xi  is  excluded  from  the  model.  Using  this 
result,  we  find  Vmax  oc  2r  —  r^^,  where  r^j  is  the  vector  of  residuals  when  modeling 
y  using  all  regressors  except  Xi.  This  indicates  that  the  direction  is  dominated  by 
the  residuals,  a  fact  which  is  demonstrated  by  examples  later  in  this  chapter.  Cook 
and  Weisberg  (1991)  discussed  how  Umax  relates  to  de-trended  added  variable  plots. 

3.2.4     Directed  Influence  on  Other  Regression  Parameter  Estimates 

Thomas  and  Cook  (1989)  examined  the  influence  that  perturbations  in  one 
column  of  X  can  have  on  the  regression  parameter  estimate  corresponding  to  a 
different  column.  Without  loss  of  generality,  suppose  we  perturb  the  first  column  of 
X  and  look  at  the  effects  on  the  last  regression  parameter  estimate  ^j..  Again  using 
the  results  of  Beckman  et  al.  (1987)  we  can  obtain  the  maximum  curvature  and  its 
direction.  Letting  di  be  of  size /:  — 1, 


v^ax  OC  A'T  =      J,  (rd[  -  /3iX(fc))         -^X^ 


Fit 


=  jrrrr;  [-rd[%  +  kH(k)Xk  -  PiXk 


-Ik 
1 


(3.10) 
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where  7fci  is  the  regression  parameter  estimate  corresponding  to  Xi  when  predicting 
Xk  with  the  other  columns  of  the  X  matrix.  The  maximum  curvature  is  given  by 

2 


a 


Ikir'  +  ^ir'k 


lk\r  +  /3irfc 


2    ^02    ,    -2    ll^l 


■y,  — 


/5i^  +  7.V 


(3.11) 


When  the  partial  correlation  between  Xi  and  X}.  is  zero,  Cmax  achieves  its  minimum 
value  of  ^,  yielding  Vmax  ex:  r^.  In  the  event  that  Xk  is  a  linear  combination  of  the 
other  regressors  with  non-zero  7^1,  Cmax  =  00  and  Umax  ex:  r.  Between  these  two 
extemes,  the  direction  of  maximum  curvature  is  a  weighted  average  of  the  residuals 
and  the  adjusted  k*^  regressor.  These  quantites  are  used  to  detect  outliers  and 
measure  leverage,  respectively.  If  the  partial  correlation  between  Xi  and  X^  is  as 
large  as  that  between  Xi  and  y,  then  the  two  vectors  are  equally  weighted.  Note 
that  this  weighting  is  invariant  to  scale  changes  in  the  columns  of  the  design  matrix 
as  well  as  the  response. 
3.2.4.1     Directed  influence  on  0 

Without  loss  of  generality,  we  again  consider  perturbing  the  first  column. 
The  rank-A;  acceleration  matrix  is 


F^'  =  — 


VlFilr/  Irilr 


(3.12) 


The  first  eigenvalue  of  this  matrix  is  unique,  and  yields  a  maximum  curvature  and 
corresponding  direction  of 

Cmax  =  J2  [$i  +  ^)       and         viocr-  ^^r^ . 


Hence,  the  maximum  curvature  and  its  direction  are  identical  to  that  of  directed 
influence  on  /5i.  Confirmation  of  this  first  eigensolution  follows: 

-(A'lkiir  +  ||r|ni(r--/3iri) 

=  ||r||V  -  AlkllVi  -  ^f llnll^if n  + /32||ri||V 

-/5il|n||V-||r||V  +  /3i^||ri||Vx+A||r-||Vi 
=  -^i^||ri|plfri+/5f||ri||Vi         ,, 
=  0. 

The  second  eigenvalue  is  ^/5f  and  has  multiplicity  A;  —  1.  As  mentioned  earlier, 
this  implies  that  there  is  no  unique  set  of  orthogonal  eigenvectors  to  complete  the 
solution. 

3.2.5     Influence  on  0 

The  acceleration  matrix  for  examining  both  ^  and  a^  simulatenously  has 
k  +  1  =  p  non-zero  eigenvalues.  The  form  of  the  eigensolution  is  unknown.  My 
numerical  studies  indicate  that  there  are  three  unique  non-zero  eigenvalues,  one  of 
which  has  multiplicity  k  -  I.  The  one  with  higher  multiplicity  appears  to  be  the 
second  largest  of  the  three,  with  a  value  of  ^^l- 

Example 

Huynh  (1982)  and  Fung  and  Kwan  (1997)  examined  data  from  20  randomly 
sampled  schools  from  the  Mid-Atlantic  and  New  England  States.  The  mean  verbal 
test  score  (VSCORE)  of  each  school's  6th  graders  is  regressed  on  staflP  salaries  per 
pupil  (SALARY),  %  of  white-collar  fathers  (WCOLR),  a  socio-economic  composite 
score  (SES),  mean  teacher's  verbal  score  (TSCORE),  and  mean  mother's  educational 
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Table  3.1:  School  data;  Pearson  correlation  coefficients 


SALARY 

WCOLR 

SES 

TSCORE 

MOMED 

VSCORE 

SALARY 

1.00 

WCOLR 

0.18 

1.00 

SES 

0.22 

0.82 

1.00 

TSCORE 

0.50 

0.05 

0.18 

1.00 

MOMED 

0.19 

0.92 

0.81 

0.12 

1.00 

VSCORE 

0.19 

0.75 

0.92 

0.33 

0.73 

1.00 

level  (MOMED).  Appendix  A  contains  model  fitting  information  and  diagnostics  for 
the  regession  analysis,  with  all  variables  centered  and  scaled. 

Diagnostic  plots  of  the  residuals,  externally  studentized  residuals,  Cook's 
Distances,  diagonal  elements  of  the  hat  matrix,  and  DEBET  AS  appear  in  Figure 
3.1.  Ordinary  least  squares  regression  indicates  cases  3  and  18  are  outliers,  although 
Huynh  (1982)  also  identified  observation  11  as  a  mild  outlier.  This  data  set  also 
has  some  redundancy  among  the  predictors  SES,  WCOLR  and  MOMED  (Table 
3.1).  Although  the  X  matrix  is  not  ill-conditioned,  the  association  between  SES 
and  MOMED  accounts  for  the  unexpected  negative  slope  estimate  for  MOMED. 

Table  3.2  shows  the  non-zero  curvatures  (eigenvalues)  for  perturbing 
single  columns  of  the  X  matrix.   Comparing  the  columns  of  the  table  provides 
information  on  which  regressor's  measurement  is  most  influential.   Here  we  see 
that  the  measurement  of  SES  is  typically  the  most  influential,  both  for  regression 
parameter  estimates  and  for  a"^.  Comparing  the  rows  provides  information  on  which 
parameter  estimates  are  most  influenced  by  these  kinds  of  perturbations.  Here  we 
see  that  WCOLR  and  MOMED  estimates  are  the  most  sensitive.  These  are  the  two 
regressors  that  are  highly  correlated  with  each  other  and  with  SES.  It  comes  as  no 
surprise  that  the  results  for  collinear  explanatory  variables  are  more  susceptable  to 
measurement  errors  of  this  nature. 
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Figure  3.1:  School  data;  diagnostic  plots 
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Table  3.2:  School  data;  curvatures  using  Xi  +  u  perturbation 

Estimates  Column  being  perturbed 

in  LD{u}) 

ovLD^^'^iu))     SALARY  WCOLR              SES    TSCORE  MOMED 

e              Ai=4.144  Ai=20.156     Ai=50.008     Ai=6.658  Ai=19.065 

:      A2=0.440  A2=0.847     A2=19.002     A2=1.413  A2=00.932 

A3=0.094  A3=0.071     A3=14.441     A3=0.599  A3=00.091 

$             Ai=3.357  Ai=18.534     Ai=26.445     Ai=4.432  Ai=17.292 

A2=0.440  A2=0.847     A2=19.002     A2=1.413  A2=00.932 


A 

3.357 

0.956 

19.008 

2.126 

0.951 

A 

0.458 

18.534 

19.890 

1.585 

10.869 

A 

0.443 

2.957 

26.445 

1.501 

1.640 

?A 

1.129 

1.859 

19.220 

4.432 

1.295 

A 

A 

,  0.444 

11.590 

19.324 

1.480 

17.292 

^ 

0.881 

1.693 

38.004 

2.825 

1.865 

Since  perturbing  the  SES  variable  is  most  influential,  examining  directional 
vectors  for  this  perturbation  is  of  interest.  The  element  corresponding  to  case  18  is 
the  most  noticeable  in  the  plot  of  Vmax  for  the  likelihood  displacement  (Figure  3.2). 
This  observation  has  a  large  residual  and  is  indicated  as  being  influential  in  the 
various  diagnostic  plots  in  Figure  3.1.  Also  note  the  similarity  of  Figure  3.2  to  the 
residuals  (Figure  3.1(a)). 

Plots  (a)-(e)  of  Figure  3.3  show  the  directions  of  maximum  curvature  when 
directing  influence  on  each  of  the  individual  regression  parameter  estimates.  In 
general  the  plots  indicate  diff'erent  elements  of  X3  to  be  influential  on  different  $i. 
Finally,  Figure  3.3(f)  provides  the  direction  of  maximum  curvature  when  directing 
influence  on  a"^.  Recall  that  this  vector  is  proportional  to  the  residuals. 
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Figure  3.2:  School  data;  X3  +  u;  perturbation;  influence  on  0 

3.3    Perturbations  to  a  Single  Row 
The  special  case  of  only  perturbing  a  single  row  of  X  was  discussed  in 
Thomas  and  Cook  (1989)  in  generalized  linear  models.  Here,  detailed  derivations 
for  regression  are  presented.  The  results  indicate  that  single  row  perturbations  have 
limited  diagnostic  value.  -        . 


3.3.1     Building  Blocks 

Without  loss  of  generality,  suppose  perturbations  are  added  to  the  k  regressor 
values  of  the  first  observation  (i.e.,  x\  becomes  x\  +  a;').  For  di  being  of  order  n, 
the  perturbed  likelihood  is 


K[e-  y)  oc  -^  loga^  -  ^(2/  -  X/3  -  d,u:'P)'{y  -  X/3  -  d.JH) 

k  k 

=  -^loga^  -  -L(y'y  -  2y,(^^,a;,)  +  (^^,a;,f). 


(3.13) 


j=i 


j=i 


Partial  derivatives  with  respect  to  w  are  as  follows: 

^^  =  -^(-2,.,.2(E/....)/3) 


(3.14) 


<•■         >.>«s 
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(b)   Umax  for  /S2 
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Figure  3.3:  School  data;  X^  +  w  perturbation;  directed  influence  plots 
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Since  the  null  perturbation  is  0,  evaluating  at  Wq  simplifies  the  expression.  We  then 
proceed  to  take  second  partial  derivatives: 

/ 


d^L^{e;y) 


duid^i 


UJ=U}o 


LV 


0         ^ 

0 

2/1  -  x[l3 

-xu^ 

0 

0         1 

d'L^{e;y) 


duda^ 


2/1  -  x[^ 


a^ 


^. 


u;=u;o 
Evaluating  at  0  =  0  and  combining  the  portions  for  the  regression  parameter 

estimates  we  obtain  .  . 


d^L^{0;y) 


dud^' 


1 


ril  -  0x[ 


dujda'^ 


ri 


i^' 


0=0,U=LJo  ^ 

where  ri  is  the  first  residual.  From  these  quantities  we  can  construct  the  A;  x  p 
matrix 


A' 


rj-^x[         -^0 


(3.15) 


The  acceleration  matrix  is  then  given  by 


F  =  2 — 


a"  [XX'Y^      0 
0' 


^n  +  ^  h3/9  +  r\{X'XY^  -  ri^x[{X'X)-'  -  n{X'X)-'xi^' 


2£l 

n 


ril  -xi$ 
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3.3.2     Directed  Influence  on  o^ 


The  acceleration  matrix  for  the  profile  likelihood  displacement  for  a   is  given 


by 


-S3 


;;.4n 


2(7 


n 


-S3' 


\r\ 


no^ 


(3.16) 


/9/9 


Solving  the  characteristic  equations  (F       —  \I)v  directly,  the  single  eigensolution  is 


4rf 


^max  =  S||/3|| 


Un 


OC^. 


Notice  how  the  curvature  depends  upon  the  residual  for  the  case  that  is  being 
perturbed,  but  the  directional  vector  is  always  proportional  to  /9  regardless  of  which 
row  of  X  is  being  perturbed.  This  suggests  that  Umax  is  not  very  useful  for  single 
row  perturbations.  The  values  of  Cmax  for  each  row  can  be  compared  to  assess  which 
row  is  most  influential  when  perturbed.  However,  the  relative  effects  only  depend 
upon  the  residuals,  indicating  there  is  little  value  added  by  plotting  the  curvatures. 
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3.3.3     Directed  Influence  on  /3j 

Without  loss  of  generality,  we  can  direct  influence  on  0^: 


Umax  oc  A'T 


oc 


ri  Jfe  -  0x[ 


1 

-7i 

Xn0 


O'k-i 


0 
-  xnl3  -  ri 

Ok-i    /  \   7i 


n  \  j  +  {xu  -  xn)0 

-7i 


rilk-i 
\ 


^-i.) 


1 

-7i 


+  :rii/3 


/ 


and 


""'"-T^llnl 


[1    -7l)'  +  (iu-a:n)3|r, 


where  Xu  is  the  predicted  value  of  Xu  from  modeling  Xi  as  a  linear  combination 
of  the  other  regressors.    More  generally,  the  maximum  local  influence  on  $j  by 
perturbing  row  i  is  -' 


'^max  —  fi 


^max  —    /^  > 


I    \^ij        XijIP 


-yj 


(1    -V.)'+(£,^.-a:,,.)^||2. 


If  the  columns  of  X  have  all  been  standardized,  then  each  of  the  elements  of  7,  and 
of  /3  is  less  than  1.  This  suggests  that  element  i  of  Vmax  is  likely  to  be  larger  than 
the  other  elements.  However,  the  signs  of  r^  and  {xij  —  Xij)$j  can  work  to  cancel 
each  other  out.  Further  interpretation  is  suppressed  in  lieu  of  the  more  general 
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Figure  3.4:  School  data;  maximum  curvatures  for  single  row  perturbations 


perturbation  presented  in  Section  3.4.   Note  that  the  curvature  and  directional 
vector  change  if  one  or  more  of  the  columns  of  X  are  rescaled. 

3.3.4  Influence  on  fi  and  6 

The  acceleration  matrix  for  directed  influence  on  ;9  is 

F^^^  =  ^  [hiM'  +  rliX'X)-'  -  n0x[{X'X)-'  -  r,iX'X)-'xi0'j  .       (3.17) 

The  eigensystem  of  this  matrix  is  unknown,  but  my  numerical  examples  indicate 
that  it  has  k  distinct  non-zero  eigenvalues.  The  eigensystem  of  F  for  the  full  set  of 
parameter  estimates  is  also  unknown. 

3.3.5  Example 

The  values  of  school  18's  regressors  are  most  influential  when  perturbing  rows 
of  X  one  at  a  time,  as  evidenced  by  a  large  maximum  curvature  (Figure  3.4).  For 
perturbing  row  18,  v^^x  is  nearly  proportional  to  ^  (Figure  3.5).  Note  that  although 
element  3  of  Vmax  appears  quite  large,  it  is  actually  less  than  the  benchmark  of 
—  =  —  —   8Q4 
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Figure  3.5:  School  data;  x[g  +  w',  Umax  for  6 


3.4     Perturbations  to  the  Entire  Design  Matrix 
At  this  point,  the  general  case  of  perturbing  all  nk  elements  of  the  design 
matrix  is  considered.  We  use  W  to  denote  the  perturbations  in  matrix  form: 


W 


Ui 

Wn+1 

^2n+l      ■ 

•  •     l^kn-n+1 

a;2 

'*^n-f2 

... 

■  ■     ^kn-n+2 

0^3 

... 

■  ■     l^kn-n+3 

Wn 

l^2n 

i^Sn 

■■      OJkn 

and  LOj  to  denote  the  f^  column  of  W.  •  . 

3.4.1     Building  Blocks 

The  derivations  in  this  section  only  require  the  presence  of  an  intercept  in 
the  model  (or  centering  of  the  response  y).  However,  for  ease  of  interpretation,  we 
assume  the  columns  of  X  have  been  centered  and  scaled.  The  perturbed  likelihood 


is 


L^iO; y)^-^\oga'-^{y-X0-  Wl3y{y  -X0-  W/3).  (3.18) 
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Proceeding  in  a  similar  fashion  to  the  derivations  for  perturbing  a  single  column, 
partial  derivatives  with  respect  to  u;  are  as  follows: 


dLUff;y)  _  1 

du:  <t2 


/3,{y-X0)-P^W0 


f3k{y-X^)-/3kW^ 
Since  the  null  perturbation  is  0,  evaluating  at  Wq  simplifies  the  expression: 


dK{0-y) 


d(Aj 


U}=(jiJo 


I3i{y-X0) 
^2(2/  -  X0) 

Pk{y-X0) 


We  then  proceed  to  take  second  partial  derivatives  and  evaluate  at  0: 


(3.19) 


(3.20) 


d'LU9;y) 


dujd/3' 


1 

0=0o,(j>J=U}o         ^^ 


1 


/ 


LV 


r     0     ...  0 

Or      0     ...    0 

0    ...  r 


\ 


^2X 


\kx  ) 


—  {Ik®r-l3®X) 


d'K{0;y) 


d(jjda'^ 


1 

0=0o,U}=U}o  ^'^ 


02r 


a' 


■(/9®r). 
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From  these  quantities  we  can  construct  the  np  x  p  matrix  A',  as  given  by  Cook 
(1986): 


A'  =  ^ 


a^ 


Ik®r-P®X 


U^®r) 


(3.21) 


where  ®  denotes  the  Kronecker  product  (Searle  1982). 

The  next  step  is  to  construct  the  acceleration  matrix.  Here  we  use  the  fact 
that  {A  ®  B){C  ®  D)  =  AC  ®  BD,  assuming  comformabiHty  of  the  matrices. 
Noting  that  A  =  A®\  =  1®A,  the  acceleration  matrix  can  be  written  as 


F  =  —  (jfc  ®  r  -  ^  ®  X)  {X'Xy^  (ik  ®  r'  -  /3'  ®  X')  +  -^  (^^'  ®  rr'^ 
=  ^(^Ik®r-$®X^  ((X'X)-i  ®r'-0'®  (X'X)-^X') 
(^^'®rr'y  ; 

(3.22) 


na^ 


3.4.2     Directed  Influence  on  q-^ 

The  acceleration  matrix  for  the  profile  likelihood  displacement  for  &^  is  the 
second  term  of  (3.22): 


(3.23) 


This  matrix  has  a  single  non-zero  eigenvalue,  ^||/3|p,  with  corresponding 
eigenvector  oc  0  ®r.  This  solution  is  confirmed  below: 


{f"'^  -  XI)v  a 


i-M'®rr'-^\m'l 


na 


a'- 


i0®r) 


na^ 


na^ 


^0  ®rr'-\m''\\rfl\00r) 
\m'0®\\r\\'r-0f\\r\\'0®r) 


(3.24) 


=  0. 
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This  result  implies  maximum  local  influence  on  a^  is  achieved  by  perturbations  in 
each  column  that  are  proportional  to  the  residuals  scaled  by  the  corresponding  ^j. 

3.4.3     Directed  Influence  on  /3 

In  this  section,  we  derive  the  full  eigensystem  of  F    ,  where 

••  Iff] 

The  following  theorem  provides  the  full  eigensystem  of  F     for  the  X  +  W 

perturbation  in  a  regression  analysis.  Note  that  eigenvalues  in  the  theorem  appeared 

in  Cook  (1986),  but  the  eigenvectors  appear  here  for  the  first  time. 

Theorem  3.1 

Recall  that  (pj  is  the  j*'^  eigenvector  of  X'X  and  that  X(pj  is  the  f^  principal 

component.  Letting  Sj  denote  the  j'^^  eigenvalue  of  X'X,  the  k  non-zero  eigenvalues 

and  eigenvectors  of  F     are 


\  =  2(t +  ^)  v^  ex  ^fc_,.+i  ®  r  -  y3  0  X<p,_^+i,  (3.25) 

Ofc-i-l-l  o 


for  j  =  l,...,k. 

Proof  of  Theorem  3.1         " ^     >",    i^. 

Because  the  algebraic  expressions  are  so  long,  the  eigensolutions  are  confirmed  in 

two  steps.  Noting  that  j-^ —  and  Vfc-j+i  ^^"^  the  /'*  eigensolution  of  (X'X)~\  we 
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have 


2  r 


^  [(jfe  ®  r  -  ;9  ®  X)  ((X'X)-i  ®  r'  -  /3'  O  (X'Xj-^X')]  (v?,.  0  r) 


=  ^[(/.®r-;3®x)(tE!<^,._o)]-A(H!  +  ||^||2)(^.^^) 


(3.26) 


2  riM'2 


^U^lV'.^r-^^X-^,.) 


2  Jlr 


C/  t/'j 


■Pir(¥',-®r)-^(;9®X<^,.) 


The  following  matrix  multipUcation  yields  the  same  expression  as  (3.26): 


F^^^-2(^  +  *)/j(/3^X^,) 
=  4  [(jfe  ®  r  -  ^3  ®  X)  ((X'X)-i  0  r'  -  3'  ®  (X'X)-^X')]  (3  O  Xip^) 


=  ^[(lfe®r--^®x)(0-||/3||Vj) 
2 


2 


ll/3|r(-V,-®r  +  (/3®X<^^.) 


^(JJ^  +  ||^ir)(^®Xv.,) 


-||y3|p(VP,.®r)-\^(3®X<^,.) 


(3.27) 


Using  (3.26)  and  (3.27),  confirmation  of  the  /"  eigensolution  is  immediate: 

Ofc-j+1  cr 


(3.28) 


=  0. 


Finally,  for  j  /  j*,  the  orthogonality  of  the  solutions  follows: 

{(Pj®r-^®Xipj)'{ipj,®r-^^X(fij,) 

=  'p'j'Pj*  ®  r'r  -  3V;*  ®  'PjX'r  -  cp'.0  ®  r'X<p^,  +  0^  (8>  if'jX'Xipj, 

=  0-0-0  + 3'/9®  Vj-Vj* 
=  0.     D 


3.4.3.1     Interpreting  Theorem  3.1 

The  closed- form  eigesolution  in  Theorem  3.1  gives  insight  into  how  slight 
changes  to  the  observed  explanatory  variables  affect  inference.  To  interpret  the 
theorem,  consider  maximum  curvature,  as  per  (3.25): 

C„.ax  =  2(^  +  H!).  (3.29) 

Ok         (^^ 

The  eigenvalues  of  X'X  are  traditionally  used  to  assess  the  amount  of  collinearity 
in  the  regressors.  If  5k  is  small,  there  is  redundancy  in  the  explanatory  variables, 
and  this  causes  the  inversion  of  X'X  to  be  unstable.  If  this  inversion  is  unstable, 
so  too  are  the  regression  parameter  estimates,  $  =  {X'X)~^X'y.  The  first  term  in 
Cmax  is  large  when  6k  is  small,  giving  further  evidence  that  the  estimates  are  more 
sensitive  when  collinearity  is  present.  The  second  term  of  Cmax  is  also  interesting. 
It  can  be  considered  a  naive  measure  of  goodness  of  fit;  when  the  relationships 
between  the  explanatory  variables  and  y  are  strong,  ||)9|p  is  large  and  a^  is  small. 
Hence,  "better  fitting"  models  are  more  sensitive  to  perturbation.  However,  ||)9|p/6-^ 
typically  grows  whenever  additional  variables  are  added  to  the  model,  indicating 
more  complicated  models  are  also  more  sensitive  to  pertubation.  In  fact,  ||/3|p/(7^ 
bears  some  similarity  to  R"^: 


SSy  y'y  n((T2  +  ||X^||2) 
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The  quantity  B?  is  the  proportion  of  variance  explained  by  the  model,  and  serves 
as  a  measure  of  goodness-of-fit.   However,  B?  can  be  made  artifically  large  by 
adding  additional  explanatory  variables,  even  when  they  contribute  very  little  to  the 
explanatory  value  of  the  model.  The  quantity  ||/9||^/(7^  behaves  similarly:  additional 
variables  decrease  the  denominator  and  usually  increase  the  numerator.  Of  course, 
the  scale  of  the  regressors  is  a  major  factor  in  the  size  of  \\0\\^,  indicating  that 
scaling  the  regressors  would  be  prudent  for  interpretation. 

So,  we  may  interpret  Cmax  in  the  following  way.  The  vector  of  parameter 
estimates  is  more  sensitive  when  (a)  there  are  collinear  variables,  (b)  there  are  a  large 
number  of  explanatory  variables,  and  (c)  the  model  closely  fits  the  data.  We  are  not 
surprised  to  see  that  the  first  two  phenomenon  are  sources  of  sensitivity.  However, 
the  suggestion  that  better-fitting  models  are  more  sensitive  to  measurement  error 
may  at  first  be  disconcerting.  When  viewed  in  light  of  Type  I  and  Type  II  errors, 
this  is  not  the  case.   Recall  that  a  Type  I  error  occurs  when  a  test  is  wrongly 
declared  significant,  while  a  Type  II  error  occurs  when  a  test  is  wrongly  declared 
insignificant.  Generally,  a  Type  I  error  is  considered  the  worse  of  the  two.  So,  if 
there  are  strong  relationships  between  X  and  y,  the  formula  for  Cmax  implies  this 
might  be  missed  because  of  measurement  error  in  the  regressors.  However,  if  there 
is  a  weak  relationship,  perturbations  to  X  would  not  make  the  relationship  appear 
strong.  In  this  sense,  the  regression  model  is  better  protected  against  Type  I  errors 
(concluding  there  are  strong  relationships  when  there  are  none)  than  Type  II  errors 
(concluding  there  are  weak  relationships  when  in  fact  there  are  strong  ones)  when 
there  is  measurment  error  in  the  regressors. 

As  a  final  comment  on  Cmax,  the  term  W^W^/a^  may  be  better  described  as 
an  indicator  of  a  closely  fitting  model  rather  than  a  well  fitting  model.  This  suggests 
that  if  we  construct  complicated  models  to  fit  the  nuances  of  a  data  set,  the  resulting 
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parameter  estimates  are  more  sensitive  to  rounding  and  measurement  errors  in  the 
explanatory  variables. 

Now  let  us  consider  Umax  as  per  Theorem  3.1: 

Umax  OC  V'fc^r-^^OXyJfc.  (3.30) 

••  r/?i 
This  vector  and  the  other  eigenvectors  of  F     provide  detailed  information  about 

which  elements  of  X  most  affect  the  stability  of  the  MLEs.   The  expression  for 

Umax  shows  how  outUcrs  and  leverage  in  the  final  principal  component  regressor 

determine  precisely  what  kinds  of  mismeasurements  have  large  local  influence  on  $. 

The  directional  vector  can  be  used  to  scale  W  to  achieve  maximum  local  influence: 


yy  max  OC 


ifkir  -  ^iX(pk        ipk2r  -  $2X(pk  "■     ...        (pkkV  -  PkXipk 


.    (3.31) 


The  elements  in  the  first  column  are  a  weighted  average  of  the  residuals  (r)  and  the 
regressor  values  of  the  last  principal  component  regressor  (Xy>^).  The  respective 
weights  are  the  first  original  regressor's  contribution  to  the  final  principal  component 
{(Pki)  and  the  negative  of  the  first  regression  parameter  estimate  {—$i). 

Alternatively,  we  can  think  of  the  elements  in  the  first  row  as  being  a 
weighted  average  of  the  explanatory  variables'  contributions  to  the  final  principal 
component  {(p'^)  and  the  negative  of  the  parameter  estimates  {0  ).  The  respective 
weights  are  the  first  residual  (ri)  and  the  negative  of  the  first  observation's  value  for 
the  final  principal  component  {—x[(p^.). 
3.4.3.2    A  special  case 

In  order  to  examine  in  more  detail  how  the  perturbations  given  in  (3.31) 
affect  $,  we  consider  a  simple  situation  for  which  there  are  closed- forms  for  the 
perturbed  parameter  estimates.  Let  us  assume  that  the  response  and  a  pair  of 
regressors  have  been  centered  and  scaled  to  have  length  one,  and  that  the  regressors 
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are  orthogonal.  The  implications  of  these  simplifications  are  as  follows: 


y  = 

\\Xi     =     X2I    =  1 

r'r  = 

l-/5i^- 

-/?! 

A  = 

x'lV 

$2  = 

x'^y 

5i  = 

5r  =  l 

(p  = 

I2 

The  first  two  eigenvalues  of  F  are  the  same: 


and  we  may  choose  either  of  the  following  vectors  for  Vmax^ 


r  -  AXi 
-p2Xi 


-^1X2 
r  -  /^2X2 


Without  loss  of  generality  we  choose  the  first  one.  Letting  X^^  denote  the  perturbed 
design  matrix,  we  have 


X^  =  X+aW 


=  X  +  a 


r  -  /3iXi         -/92X1 


(3.32) 


ar  +  (1  -  aPi)Xi         X2  -  a$2Xi 
Xi^u    -^2,0; 

for  some  real  number  a.  Now,  consider  the  effects  of  perturbation  on  /?i.  Assuming 
a  no-intercept  model  is  still  fit  after  perturbation,  the  perturbed  estimate  is  given  by 

r'i,uy 


Pl,.= 


\ri,u 


(3.33) 
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where  ri,a,  is  the  residuals  from  fitting  Xi^^  as  a  function  of  X2,a;-  The  building 
blocks  needed  are  as  follows: 


J^2,(j  —  -^2,w(-X^2,t.;-^2,w)       ^2,, 


1 


(X2-a/32Xi)(X'2-a/32Xl) 


1  +  a2/9| 
ri,^  =  (/  -  H2,<^)Xi,^  =  (1  -  a/3i  -  acp2)Xi  +  0X2  +  ar 

r[,^y  =  (1  -  a/^i  -  ac$2)Pi  +  0^2  +  a(l  -  $1  -  $1) 
|ri,J|2  =  (1  -  aA  -  ac/52)'  +  c^  +  a'{l  -  Pf  -  4^), 


where  c  =  "^f't'l  It  follows  that 

A      ^  (1  -  aA  -  QC;92)/gi  +  c^2  +  q(1  -  /3f  -  /gp 
^'■"        (1  -  a$,  -  ac/92)2  +  c2  +  a2(l  -  pf  -  /5|)  ' 

For  the  sake  of  comparison,  we  can  also  consider  perturbing  only  the  first 
column.  Perturbation  in  the  direction  of  Umax  when  directing  effects  on  0i  results  in 
the  following  perturbed  X  matrix: 


X^  =  X  +  a*W* 


=  X  +  a* 


r  -  /5iXi        0 


(3.34) 


a*r  +  {l-a*Pi)Xi         X2 
as  per  (3.8).  Here,  the  perturbed  estimate  is  given  by 

>:..:  ^      ^(l-a*A)/?i+a*(l-/^?-4') 

(l-a*A)2  +  a*2(l-/52_42)- 

In  order  for  the  two  perturbations  to  be  comparable,  they  should  be  equal  in  size. 
This  is  accomplished  by  stacking  the  columns  of  the  perturbation  matrices  and 
making  the  resulting  vectors  equal  length.  In  other  words,  the  Froebenius  norms 


of  aW  and  a*W*  should  be  equal.   Letting  a* 
adjustment. 


V^ 


provides  the  necessary 


Now  we  can  plot  ^x^^  for  some  selected  values  of /Sj  and  $2,  considering  values 
of  a  ranging  from,  say,  —  V^  to  \/2.   This  eliminates  perturbation  matrices  with 
Froebenius  norms  that  are  larger  than  that  of  the  unperturbed  X  matrix.  Figure  3.6 
shows  plots  of  ^i^ij  for  four  sets  of  parameter  estimates.  The  solid  lines  represents  the 
parameter  estimate  in  the  direction  Umax  for  the  two-column  perturbation,  while  the 
dashed  line  represents  parameter  estimates  in  the  direction  Vmax  for  the  one-column 
perturbation.  All  of  the  influence  curves  in  plots  (a)  and  (b)  have  a  value  of  .2  under 
the  null  perturbation,  i.e.  when  a  =  0.  Likewise,  the  curves  in  plots  (c)  and  (d)  are 
all  equal  to  .5  at  the  null  perturbation.  In  all  plots  we  see  that  the  single  column 
perturbation  does  not  have  the  same  effect  as  the  dual  column  perturbation.  The 
discrepancy  between  their  effects  is  larger  when  /?2  is  larger  (plots  (b)  and  (d)). 

3.4.4    Influence  in  Principal  Components  Regression 

Consider  the  canonical  form  of  the  model,  as  given  in  Chapter  1: 

Y  =  Zoi  +  e,  (3.35) 

where  Z  =  X(p  and  a  =  (fi'/3.  The  following  theorem  proves  that  the  direction  of 
maximum  curvature  is  the  direction  of  maximumum  directed  influnce  on  Qjt. 
Theorem  3.2 

Let  a.  denote  the  MLEs  of  oc  for  model  (3.3b).  The  pairs  Xj,  Vj  given  in  Theorem 
3.1  are  the  maximum  curvature  and  direction  of  maximum  curvature  for  directed 
influence  on  cik-j+i,  j  =  1, . . . ,  fc. 
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=  .5, 


Figure  3.6:    Effects  of  single  and  double  column  perturbations  on  ^i.    Solid  line- 
double  column;  dashed  line-  column  1  only. 
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Proof  of  Theorem  3.2   We  begin  by  obtaining  update  formulas  for  A  and  L      as 


per  Section  2.7: 

■; 

'-'    a'=a/^ 

,-                 ■    ■■"?>••■ 

V.  ■   .:-  =A>^ 

=  (^l^\X'X)<p) 

^a''D{l/6). 

(3.36) 


(3.37) 
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Next,  we  find  maximum  directed  influence  on  dfc-j+i  using  the  results  of  Beckman 
et  al.  (1987): 


Vmax  OC  KTa,^j+i 


oc 


(fi<S)r  —  0  (S)  X(p 


0 

& 

\/*fc-j+i 

0 


oc  (fik-j+i  ®r-P®  Xipk^j^i 


Cmax  —  2||AQTa^__^._^J 


^. (vLi+1  ®  r'  -  ^'  (g)  v5'fc_,.+iX')(<^fc_,+i  O  r  -  3  ®  Xcp^_^^,) 

-^r^— (1  ®  r'r  -  \m'  0  ^',_.^,X'X^,_^^,) 


^=^<^fc- 


j+i 


=2(^.m 


4-j 


/+i 


Since  these  expressions  are  the  j*''  eigensolution  of  F  \  the  proof  is  complete.        D 

This  theorem  gives  added  insight  into  the  direction  of  maximum  curvature. 
In  short,  v^^  is  the  direction  that  changes  ay,  the  most.  Similarly,  the  /''  largest 
eigenvalue  and  associated  eigenvector  specifically  correspond  to  eflfects  on  afc-i+i- 
Also,  the  sizes  of  the  curvatures  show  that  the  estimated  coefficients  for  the  most 
important  principal  component  regressors  are  also  the  least  sensitive. 
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Example  1 

Consider  the  Scottish  hills  data  set  described  in  Chapter  1.  The  purpose  of 
this  example  is  twofold    to  demonstrate  how  the  X-matrix  results  can  be  displayed 
graphically,  and  to  demonstrate  how  collinearity  affects  the  curvature  values  when 
examining  influence  on  $. 

Figure  3.7  gives  the  direction  of  maximum  curvature  for  directed  influence  on 
a^.  The  first  35  elements  correspond  to  perturbations  of  the  DISTANCE  regressor 
values,  while  elements  36-70  correspond  to  perturbations  of  the  CLIMB  regressor 
values.    The  value  for  race  18's  DISTANCE  is  the  largest  contributer  to  this 
influential  perturbation.  Elements  7  (DISTANCE  for  race  7)  and  53  (CLIMB  for 
race  18)  are  also  large 

The  information  in  Figure  3.7  can  be  difllicult  to  discern  because  the 
perturbation  vector  enters  into  the  likelihood  in  the  form  of  a  matrix,  W.  Another 
way  to  display  Umax  is  to  first  take  its  absolute  value  and  then  to  consider  the 
elements  corresponding  to  each  column  of  W.  Each  portion  can  then  be  plotted 
against  indices  1, . . .  ,n  and  shown  together  (Figure  3.8(a)).  This  graphical  display 
provides  a  "profile  plot"  of  each  column  of  W's  contribution  to  the  direction  of 
maximum  curvature.  Here  we  see  again  that  elements  corresponding  to  observations 
18  and  7  as  being  the  most  influential.  There  is  also  some  evidence  that  measurement 
error  in  the  DISTANCE  values  (solid  line)  are  more  influential  than  error  in  the 
CLIMB  values  (dashed  line). 

Figures  3.8(b)  and  3.8(b)  provide  profile  plots  of  the  absolute  value  of  the 
two  directional  vectors  corresponding  to  Vi  and  V2  of  F    .  The  first  plot  draws 
attention  to  observations  7,  11,  and  18,  while  no  observations  appear  noteworthy  in 
the  second  plot. 

At  this  point  we  turn  our  attention  to  the  curvature  values  that  accompany 
these  directional  vectors  (Table  3.3).  The  first  column  shows  the  curvatures  for  local 
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Table  3.3:    Hills  data;  curvatures  using  X  +  W  perturbation.     Model  I  contains 
DISTANCE  and  CLIMB.  Model  II  contains  DISTANCE  and  CLIMB-RATE 


Estimates  in  LD{u:) 

or  LD^^'^iv) Model  I        Model  II 

d^  Ai  =  30.47     Ai  =  28.52 


Ai=21.15       Ai=16.41 
A2=16.49       A2=16.25 


influence  when  fitting  the  regression  model  for  winning  time  as  a  linear  combination 
of  DISTANCE  and  CLIMB.  However,  as  was  evident  in  Figure  1.5(c),  the  two 
regressors  are  linearly  related.  An  alternative  model  can  be  fit  by  replacing  climb  in 
feet  with  climb  in  feet  per  distance  travelled,  CLIMB-RATE  .  This  regressor  does 
not  have  the  strong  positive  association  with  distance  that  the  original  coding  of 
CLIMB  has.  This  should  reduce  the  coUinearity  in  the  model,  thereby  reducing 
sensitivity  to  perturbations  in  the  X  matrix. 

Using  standardized  regressors,  the  new  model  is  fit.  Column  2  of  Table  3.3 
shows  the  curvature  values  for  this  new  model.  Cmax  for  directed  influence  on  ^ 
has  been  substantially  reduced,  indicating  that  the  MLEs  for  the  second  model  are 
indeed  less  sensitive  to  perturbations  in  the  X  matrix. 

Figures  3.9(a),  (b)  and  (c)  provide  the  profile  plots  of  the  absolute  value 
of  the  directional  vectors  that  accompany  the  curvatures  in  Table  3.3,  Column  II. 
All  three  plots  tend  to  indicate  DISTANCE  values  as  being  more  influential  than 
CLIMB-RATE  values,  and  all  three  plots  show  substantial  change  from  those  in 
Figure  3.8.  The  plot  for  a"^  now  indicates  the  DISTANCE  value  for  observations  7 
and  18  as  influential,  rather  than  just  18.  No  elements  stand  out  in  Vmax,  while  case 
11 's  DISTANCE  value  is  a  very  substantial  contributer  to  t;2- 
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Figure  3.7:  Hills  data  X  +  W  pert.;  Vmax  for  a^ 
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Example  2 

We  now  consider  applying  the  X  +  W  perturbation  to  the  school  data  and 
examining  directed  influence  on  0.  The  purpose  of  this  example  is  to  demonstrate 
alternative  methods  for  displaying  Umax  when  the  number  of  regressors  and/or 
observations  is  large. 

Figure  3.10  is  the  plot  of  Umax  for  directed  influence  on  0.  Elements  1-20 
correspond  to  SALARY,  elements  21-40  correspond  to  WCOLR,  elements  41-60 
correspond  to  SES,  elements  61-80  correspond  to  TSCORE,  and  elements  81-100 
correspond  to  MOMED.  The  two  regressors  that  have  little  association  with  other 
explanatory  variables,  SALARY  and  TSCORE,  have  small  elements.  In  contrast, 
some  of  the  elements  for  the  other  regressors  are  more  substantial.  This  plot  does  a 
fair  job  of  highlighting  regressors,  but  a  poor  job  of  highlighting  observations. 

Figure  3.11  is  the  profile  plot  of  |umax|-  It  is  easy  to  see  that  cases  3  and  18 
have  multiple  regressor  values  that  are  influential.  In  addition,  one  of  the  values  for 
case  2  is  influential.  However,  is  to  difficult  to  keep  track  of  which  line  represents 
which  column.  Another  option  is  to  plot  "profiles"  of  each  row,  as  in  Figure  3.12. 
Here,  some  of  the  elements  of  observations  2,  3,  and  18  are  highlighted. 
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(a)  l^maxl  for  a^ 
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Figure  3.8:  Hills  data;  X  ^W  pert.;  column  profiles  of  directional  vectors.  DIS- 
TANCE values  are  connected  by  solid  line.  CLIMB  values  are  connected  by  a  dashed 
line. 
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(a)  It^maxl  for  u 
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(b)  l^maxl  for  ;9 
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Figure  3.9:  Hills  data;  Model  II;  X  + IF  pert.;  column  profiles  for  directional  vectors. 
DISTANCE  values  are  connected  by  solid  line.  CLIMB-RATE  values  are  connected 
by  a  dashed  line. 
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Figure  3.10:  School  data;  X  +  W  pert.;  Umax  for  directed  influence  on  $ 


Figure  3.11:  School  data;  X+W  pert.;  Column  profiles  of  |umax|  for  directed  influence 
on  0.  Thick  solid  line-  SALARY,  thin  solid  line-  WCOLR,  dot-dashed  line-  SES, 
small-dashed  line-  TSCORE,  large-dashed  line-  MOMED. 


85 


0.4 

18  a 

18  . 

0.3 

rK 

\    ^A 

/ 

/  y 

0.2 

// 

0.1 

\////^ 

^^^^ 

^^m 

2 


3 


Figure  3.12:  School  data;  X  +  W  pert.;  row  profiles  of  \vj_ 
on  ^  with  large  elements  labeled. 


for  directed  influence 


A  bubble  plot  of  |  WmaxI  provides  the  best  graphical  display  of  the  information 
in  I'max  (Figure  3.13).  Each  element  is  represented  by  a  point  plotted  according  to 
both  the  corresponding  regressor  (x-axis)  and  case  (y-axis).  Around  each  point  is  a 
bubble  with  radius  equal  to  the  absolute  value  of  the  element,  and  so  large  bubbles 
indicate  large  contributors  to  the  direction  of  maximum  curvature.  Again,  we  see 
that  the  elements  corresponding  to  Xi8,2,  a;i8,5,  0:3,2,  2:3,5,  and  0:2,3  have  the  largest 
influence,  as  evidenced  by  having  the  largest  bubbles. 
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Figure  3.13:  School  data;  X  +  W  pert.;  bubble  plot  of  l^maxl  for  ^ 


CHAPTER  4 
LOCAL  ASSESSMENT  FOR  OPTIMAL  ESTIMATING  FUNCTIONS 

It  is  possible  to  examine  the  local  influence  of  perturbations  on  measures  other 
than  the  likelihood  displacement,  LD{u}),  and  the  profile  likelihood  displacement, 
I,Z)[^i](a;).    Several  authors  have  applied  local  influence  techniques  to  specific 
measures  of  influence.   For  instance,  Lawrance  (1988)  and  Wu  and  Luo  (1993c) 
looked  at  influence  on  the  Box-Cox  transformation  parameter.   Also,  influence 
has  been  examined  on  the  residual  sums  of  squares  in  regression  (Wu  and  Luo 
1993b),  deviance  residuals  and  confidence  regions  in  GLMs  (Thomas  and  Cook  1990; 
Thomas  1990),  and  goodness-of-link  test  statistics  in  GLMs  (Lee  and  Zhao  1997). 
However,  considerably  less  attention  has  been  given  to  assessing  local  influence 
when  estimation  is  not  performed  by  ML.  Some  work  has  been  done  on  extentions 
to  Bayesian  analysis  (McCuUoch  1989;  Lavine  1992;  Pan  et  al.  1996),  and  Lp  norm 
estimation  (Schall  and  Gonin  1991). 

In  this  chapter,  local  influence  techniques  are  extended  in  two  ways.  First, 
we  consider  the  influence  of  perturbations  on  a  general  class  of  influence  measures. 
Second,  we  address  parameter  estimates  that  are  found  via  estimating  equations. 
The  intent  is  to  unify  nearly  all  previous  work  on  this  methodology  and  provide  a 
conceptual  basis  and  computational  tools  for  future  applications.  The  first  three 
sections  outline  the  inferential  platform,  a  general  class  of  influence  measures, 
and  assumptions  about  the  perturbations.  Next,  techniques  for  local  assessment 
of  perturbations  are  discussed.   In  Section  4.4,  we  give  some  algebraic  results 
for  the  two  main  diagnostic  quantities:  the  gradient  vector  and  the  acceleration 
matrix.  Section  4.5  contains  computational  formulas,  while  Sections  4.6-4.8  contain 
applications.  Several  measures,  including  LD{(jj)  and  LD^^^^u)),  are  considered  in 
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section  4.6  using  ML.  Section  4.7  addresses  influence  on  the  generalized  variance 
with  an  emphasis  on  ML.  Finally,  the  last  section  considers  quasi-likelihood. 

4.1     Inferential  Platform 
This  section  outlines  assumptions  about  estimation,  the  nature  of 
perturbations,  and  how  the  effects  of  perturbation  are  measured. 

4.1.1     Estimating  Functions 

Suppose  that  a  statistical  analysis  is  performed  to  estimate  p  parameters, 
0.  Further,  suppose  that  point  estimates,  0,  are  obtained  as  solutions  to  a  set  of  p 
unbiased  estimating  equations, 

^^'   ■  ■      \    *  h{e-y)=0,  (4.1) 

where  for  each  estimating  function^  hi{0;y), 

■    \     '■  *    Eg{hi{9;y))  =  0  i  =  l,...p. 

Sometimes  these  estimating  equations  arise  from  minimizing  or  maximizing  some 
criterion  with  respect  to  B.  Examples  include  least-squares  normal  equations,  score 
equations  and  equations  resulting  from  minimizing  a  Bayes  risk.  Quasi-likelihood 
(QL)  equations  (Wedderburn  1974)  were  originally  motivated  by  maximizing  a 
quasi-likelihood  function,  but  the  existence  of  the  objective  function  is  not  necessary. 
Generalized  estimating  equations  (GEEs)  (Liang  and  Zeger  1986),  like  QL  equations, 
merely  require  assumptions  about  first  and  second  moments. 

The  theory  of  estimating  equations  (EEs)  dates  back  to  early  work  by  V.  P. 
Godambe  (Godambe  1960;  Godambe  1976).  There  has  been  renewed  interest  in  the 
subject  due  to  new  estimation  techniques  for  semi-parametric  models  and  models  for 
clustered  non-normal  responses  (Zeger  and  Liang  1986;  Prentice  1988;  McCullagh 
and  Nelder  1989;  Desmond  1997a).  A  considerable  amount  of  work  has  centered 
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on  obtaining  optimal  estimating  functions  (OEFs)  (Godambe  1976;  Crowder  1986; 
Crowder  1987;  Godambe  and  Heyde  1987;  Firth  1987;  Godambe  and  Thompson 
1989;  Godambe  1991).  To  this  end,  the  efficiency  matrix  of  a  set  of  p  estimating 
functions  is  defined  as 


Eff(/i)  =  (E 


dh/_ 
dG 


)-'W[h]{E 


dh 
89' 


)-\  (4.2) 


Functions  that  minimize  this  non-negative  definate  matrix  are  considered  optimal. 
Roughly  speaking,  this  index  of  efficiency  is  small  when  the  variance  of  the 
estimating  functions  is  small  and  the  average  gradient  is  large.  Provided  that  the 
estimating  equations  arise  from  maximizing  or  minimizing  some  function,  we  may 
interpret  that  large  gradients  indicate  the  function  is  not  flat  around  its  extremum. 

Assuming  that  a  fully  parameteric  form  for  the  data  is  assumed,  the  score 
functions  are  the  unique  OEFs  up  to  a  scalar  multiple  (Godambe  1960;  Bhapkar 
1972).  However,  with  fewer  assumptions  about  the  data,  it  is  generally  not  possible 
to  find  a  unique  set  of  optimal  estimating  functions.  Instead,  optimality  is  bestowed 
upon  estimating  functions  within  a  certain  class.   One  such  class  of  estimating 
functions  are  those  that  are  linear  functions  of  the  data. 

Under  mild  regularity  conditions,  the  asymptotic  variance  of  0  is  in  fact  given 
by  (4.2)  (Morton  1981;  McCullagh  1983;  Crowder  1986).  Thus,  minimizing  Eff(/i) 
is  equivalent  to  minimizing  the  asymptotic  variance  of  the  parameter  estimates. 
Note  that  the  first  interpretation  of  the  efficiency  matrix  is  appealing  in  that  it  is  a 
finite  sample  index,  but  somewhat  awkward  in  that  it  is  based  on  a  property  of  the 
estimating  functions.  The  second  interpretation  is  appealing  in  that  it  is  based  on  a 
property  of  the  estimates,  but  limited  in  that  it  requires  asymptotics. 

For  the  purposes  of  applying  local  influence  diagnostics  we  assume  only  that 
the  estimating  functions  have  been  standardized.  That  is,  given  a  set  of  unbiased 
estimating  functions,  h*{d\  y),  we  utilize  the  standardized  estimating  functions  given 
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by 


hi0;y)  =  -E 


dh*' 


{W[h*])-'h\  (4.3) 


This  standardization  has  three  implications.  First,  \{h{0;y))~^  is  equivalent 
to  (4.2)  and  hence  equivalent  to  the  asymptotic  variance  of  the  parameter  estimates. 
In  addition,  the  standardized  estimating  functions  are  optimal  in  the  following  sense. 
Given  a  set  of  unbiased  estimating  functions,  {/i-(0;y)},  the  OEFs  amongst  the 
class  of  linear  combinations  of  the  h\  are  given  by  (4.3).  Hence,  the  standardized 
estimating  functions  are  an  optimal  rescaling  of  the  original  estimating  functions. 
Note  that  the  negative  sign  is  present  in  (4.3)  so  that  the  score  equations,  rather 
than  the  negative  of  the  score  equation,  are  considered  standardized.    Finally, 
standardized  estimating  functions  are  "information  unbiased"  (Lindsay  1982), 
meaning  that  the  following  equality  holds: 

dh 


-E 


=  E[hh']  =  V[/i].  (4.4) 


d0\ 

Hence,  V[^]  =  —E[dh/d0'].  This  fact  is  used  later  in  the  chapter  to  derive  influence 
diagnostics  for  confidence  ellipsoids. 

4.1.2    Perturbations 

Perturbations  are  again  introduced  as  a  vector  of  q  real  numbers,  a;.  These 
perturbations  may  affect  the  data,  the  model  and  its  assumptions,  or  perhaps  even 
just  the  estimation  technique  itself.  Examples  of  data  perturbations  are  minor 
changes  or  rounding  errors  in  the  response  and  the  explanatory  variables.   An 
example  of  model  perturbation  is  the  variance  perturbation  scheme  used  in  Chapters 
1  and  2.  Finally,  perturbing  the  contribution  of  each  observation  to  the  estimating 
functions  by  case-weights  is  an  example  of  perturbing  the  estimation  technique. 
This  perturbation  does  not  affect  the  distributional  assumptions  of  the  data,  but 
rather  how  each  observation  is  utilized  for  inference.  Another  perturbation  to  the 
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inference  method  would  be  to  allow  the  value  of  the  hyper-parameters  in  a  Bayesian 
analysis  to  vary.  Though  some  might  argue  that  the  hyper-parameters  are  part  of 
the  model,  their  values  are  often  chosen  for  computational  convenience. 

For  the  purposes  of  this  chapter,  we  assume  that  the  perturbation  scheme 
results  in  a  set  of  p  perturbed  estimating  equations, 

'    K{0;y)=O,  (4.5) 

-•V.  -      "  \-"  V     ." 

which  have  solutions  0^.  It  is  also  assumed  that  there  exists  a  null  perturbation,  Wq, 
whose  estimating  equations  are  identical  to  the  original  ones.  Finally,  it  is  assumed 
that  there  are  no  constraints  on  cj  that  would  limit  their  values  in  a  neighborhood  of 
Wo-  For  example,  constraining  the  space  of  variance  perturbations  to  positive  values 
is  allowed.  However,  constraining  the  perturbations  to  sum  to  one  is  not  allowed. 
These  kinds  of  constraints  could  be  accommodated,  but  are  not  considered  here. 

4.1.3    Influence  Measures 

The  final  item  to  specify  is  a  quantity  whose  sensitivity  to  perturbation  is  of 
interest.  This  influence  measure  serves  as  a  barometer  for  how  perturbation  effects 
inference.  The  measure  can  depend  on  w  directly  and  indirectly  via  some  r  x  1 
vector  of  functions  of  the  perturbed  parameter  estimates,  7(^0))-  Some  influence 
measures,  m{u},'y{0^)),  may  be  appropriate  regardless  of  the  method  of  estimation 
and  the  perturbation,  while  others  may  be  more  application-specific. 

Examples  of  rather  general  objective  measures  are  the  following: 

..,      .  mi  =x'0^  (4.6) 

m,  =  -l^  (4.7) 
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where  E^{yi)  =  fii^.  None  of  these  three  measures  depend  on  w  directly.  The  first  is 
a  linear  combination  of  the  perturbed  parameters.  Here,  ^{0^)  =  x'O^  is  a  scalar. 
The  second  measure  is  the  ratio  of  the  i*''  unperturbed  parameter  estimate  to  the  z*'' 
perturbed  estimate.  Again,  7(^0;)  =  ^i„  is  a  scalar.  The  third  measure  resembles  a 
Pearson  goodness-of-fit  statistic  based  on  the  perturbed  estimates.  Since  it  is  not  yet 
specified  how  the  means,  /ij^,  depend  on  the  estimates,  we  simply  write  7(^0;)  —  0^. 
The  following  measures  are  more  application-specific,  in  that  they  depend  on 
how  inference  is  performed  and  perhaps  even  upon  the  perturbation  itself: 

m,  =  LD{uj)  =  2\L{e-  y)  -  L{9^-  y)]  "  (4.9) 

m5  =  LD[^^Hu;)  =  2[L(0i,^2;?/)-m.;,^(^iJ;y)]  (4.10) 

m6  =  LD*(a;)  =  2[L,(0,;y)-L,(^;y)]  (4.11) 


ms  =  log 


log(^^^) 

,     f{y\e^y_ 


(4.12) 
(4.13) 


The  first  of  these  measures  is  the  likelihood  displacement,  for  which  7(^0,)  =  Ouj- 
The  second  is  the  profile  likelihood  displacement,  which  has  7(^0^)'  =  (^iw)5(^iw)')) 
where  g{6iu)  maximizes  the  profile  likelihood  of  Bi^^  with  respect  to  62.  The  third 
measure  is  a  likelihood  displacement  with  a  "moving  frame"  because  the  base 
likelihood  of  the  measure  is  the  perturbed  one.  Obviously,  these  three  measures  are 
intended  for  use  when  estimation  is  performed  by  maximum  likelihood. 

The  influence  measure  rrij  is  a  Kullback-Leibler  divergence  between  the 
densities  f{y  \  9)  and  f{y  \  OJ).   The  quantity  is  non-negative,  achieving  its 
minimum  value  of  zero  when  the  two  densities  are  equivalent.  Here,  7(^0;)  =  ^w 

The  last  measure  addresses  sensitivity  of  asymptotic  standard  errors.  Recall 
that  the  determinant  of  a  matrix  equals  the  product  of  its  eigenvalues,  and  when 
the  matrix  is  a  variance-covariance  matrix  for  a  vector  of  estimates  0^  the  squared 
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eigenvalues  are  proportional  to  the  lengths  of  the  principal  axes  of  a  confidence 
ellipsoid  for  6.  Hence,  the  determinant  of  the  variance  provides  a  single  measure 
of  the  confidence  ellipsoid's  size.   Provided  the  estimating  functions  have  been 
standardized,  mg  is  the  log  of  the  estimated  generalized  asymptotic  variance  of  0.  It 
is  a  function  of  a;  both  directly  and  indirectly  via  7(^w)  =  Out- 

As  a  final  note,  it  is  not  necessary  for  the  influence  measure  to  be  a  function 
of  point  estimators.   In  Bayesian  analyses,  perturbation  can  be  measured  by 
the  Kullback-Leibler  distance  between  the  perturbed  and  unperturbed  posterior 
densities.  Such  a  measure  would  depend  on  u;,  but  not  on  any  point  estimates  of 
0.   McCulloch  (1989)  developed  a  general  local  influence  technique  for  assessing 
perturbation  of  the  prior  distribution  in  a  Bayesian  analysis.    Other  types  of 
perturbation  analysis  for  Bayesian  models  are  found  in  Kass  et  al.  (1989),  Geisser 
(1991),  Geisser  (1993),  and  Weiss  (1996). 

4.2     Local  Assessment  of  Perturbations 
At  this  point  we  have  set  the  stage  for  our  sensitivity  analysis:  a  perturbation 
is  introduced  into  the  inference,  and  a  measure  of  the  effects  is  chosen.  The  next 
step  is  to  obtain  some  useful  diagnostics.  This  section  decribes  how  a  Taylor  series 
expansion  of  m  and  some  differential  geometry  results  can  be  used  to  describe  the 
function. 

4.2.1     First-order  Local  Assessment 

As  in  Chapter  1,  let  ijiJ.u{a)  =  ujq  +  av  he  a  vector  in  the  space  of 
perturbations.  The  unit-length  vector  v  determines  the  direction  from  Wq  and  the 
value  a  determines  the  distance  from  a>o  in  fi.  The  value  of  m  for  a  perturbation  of 
length  a  in  direction  v  can  be  approximated  by  a  first-order  Taylor-series  expansion 
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around  the  null  perturbation: 

m(a)  «  m(0)  + -^(a  -  0) 

=  m{0)+ap-v  (4.14) 

=  m(0)  +  arh'v, 

where  the  partial  derivatives  are  evaluated  at  a  =  0.   The  quantity  rh'v  has  a 
geometric  interpretation.  Consider  the  influence  graph  of  m  (i.e.,  the  influence 
function  described  as  a  surface  in  3f?'"^^).  For  each  direction  v,  there  is  an  associated 
lifted  line  on  the  influence  graph.  The  slope  of  this  "slice"  of  the  graph  is  given  by 

S  =  rh'v,  (4.15) 

and  it  is  sometimes  called  the  directional  derivative.  The  direction  with  the  largest 
absolute  slope  is  proportional  to  m,  the  gradient  vector. 

Both  the  directional  derivative  and  the  gradient  vector  have  diagnostic  value. 
To  explain,  recall  from  Chapter  2  that  we  may  use  a  perturbation's  Euclidean 
distance  from  the  null  perturbation,  ||ti;  —  a;o||,  as  a  measure  of  its  size.  Further,  let 
us  define  the  effect  of  the  perturbation  as  |m(o)  —  m(0)|.  If  m(0)  =  0,  the  effect  of 
a  perturbation  is  the  absolute  value  of  m{a).  The  directional  derivative  can  then 
be  interpreted  as  the  approximate  eff"ect  of  a  size-1  perturbation  in  direction  v. 
Furthermore,  suppose  that  we  have  perturbations  in  two  different  directions:  u;^ 
in  direction  v^  and  Wg  in  direction  ug.  If  1 5^  |  is  A;  times  larger  than  j^sl,  then 
perturbation  Wg  must  be  k  times  larger  than  a>^  to  have  the  same  eflFect.  Therefore, 
directional  derivatives  are  measures  of  the  relative  influence  of  perturbations. 
Additionally,  the  gradient  vector  shows  how  perturbations  can  be  scaled  to  achieve 
maximum  local  influence.  That  is,  Wq  ±  (m/||m||)  has  the  largest  approximate 
effect,  ||m|p,  among  perturbations  of  size  1. 
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If  m(0)  is  a  local  minimum,  maximum  or  saddlepoint  of  the  influence  graph, 
then  the  gradient  is  zero.  Here,  a  first-order  local  assessment  does  not  provide  any 
diagnostic  information.  Instead,  we  can  turn  to  a  second-order  approximation  of  the 
influence  graph. 

4.2.2     Second-order  Local  Assessment 

A  second-order  Taylor  series  expansion  of  m  is  given  by 

m(a)  «  m(0)  +  — (a  -  0)  + -^^(a  -  0)2 

,^,        dm         a^    ,   dm^  /^  i«\ 

a^       - 
=  m(0)  -I-  arh'v  +  —v'Mv,  ,. 

where  the  derivatives  are  again  evaluated  at  a  =  0.  The  quantity  A  =  v'Mv  also 
has  a  geometric  interpretation:  it  is  the  rate  of  change  in  the  slope  of  the  lifted  line 
associated  with  direction  v,  i.e.  it  is  the  acceleration  of  the  curve.  The  matrix  M 
is  the  acceleration  matrix,  and  the  eigenvector  associated  with  its  largest  absolute 
eigenvector  is  the  direction  of  maximum  absolute  acceleration.   If  the  gradient 
vanishes,  the  second-order  approximation  simplifies  to: 

m{a)  =  m(0)  +  —v'Mv.  (4.17) 

Li 

Both  the  acceleration  and  the  acceleration  matrix  have  diagnostic  value, 
particulary  when  the  gradient  vanishes.  In  this  case,  A  can  be  interpreted  as  the 
approximate  eff'ect  of  a  size-\/2  perturbation  in  direction  v.  In  addition,  suppose 
that  we  have  perturbations  in  two  different  directions:  u;^  and  wg.  If  \Aji\  is  k 
times  larger  than  \As\,  then  perturbation  U3s  must  be  \/k  times  larger  than  w^  to 
have  the  same  effect.  Finally,  the  eigensystem  of  the  acceleration  matrix  can  be 
used  to  identify  influential  perturbations.  The  largest  absolute  eigenvector  of  M 
is  the  maximum  approximate  eff'ect  of  a  size-v/2  perturbation,  and  the  associated 
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eigenvector  is  the  direction  of  maximum  approximate  effect.  Other  eigensolutions 
can  also  be  used. 

4.3    Acceleration,  Curvature  and  Normal  Curvature 
Two  more  geometric  descriptions  of  the  influence  measure  can  be  calculated: 
curvatures  and  normal  curvatures.   These  quantities  are  functions  of  both  the 
gradient  vector  and  the  acceleration  matrix.   When  the  gradient  vanishes,  the 
curvature,  normal  curvature,  and  the  absolute  value  of  the  acceleration  associated 
with  a  directional  vector  are  all  equivalent.  From  (1.5)  the  curvature  associated  with 
direction  v  at  the  null  perturbation  is 

id^m.{a)  I 


c 


da'^ 


Using  (4.16),  it  follows  that 

\v'Mv\ 


(4.18) 

a=0 


c  = 


(1  +  v'rnrh'vYI'^  ■■■         '  ,        . 

(4.19) 

"  (i;'(/g  +  mm»3/2- 

If  the  slope  is  zero,  then  C  =  |v4|.  As  mentioned  previously,  curvature  is  a  measure 
of  how  quickly  a  curve  turns. 

The  notion  behind  normal  curvature  is  a  bit  more  complicated.  The  vector 
(u;„(a)',  0)'  in  W~^^  is  projected  onto  the  tangent  plane  of  the  influence  graph  rather 
than  the  graph  itself.  A  plane  is  then  created  that  contains  both  the  projection 
as  well  as  the  vector  that  is  orthogonal  to  the  plane,  the  principal  normal  vector. 
This  plane  then  intersects  the  surface  to  create  a  normal  section.  Using  the  tangent 
plane  as  the  basis,  curvature  of  the  normal  section  can  be  computed  as  if  it  were 
a  simple  plane  curve.  The  resulting  normal  curvature  is  a  description  of  how  the 
influence  graph  deviates  from  its  tangent  plane.  The  normal  curvature  associated 
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with  direction  v  at  the  null  perturbation  is  (Cook  1986) 

(l  +  ||m||2)V2t;'(J,  +  mm')r'  ,  ^  "     ^ 

If  the  gradient  is  zero,  then  NC  =  C  =  \A\. 

The  two  geometric  descriptions  NC  and  C  can  be  distinquished  as  follows. 
Curvature  is  the  reciprocal  of  the  radius  of  the  best  fitting  circle  to  the  lifted  line 
associated  with  v,  while  normal  curvature  is  the  reciprocal  of  the  radius  of  best 
fitting  circle  to  the  normal  section  associated  with  v.  If  the  gradient  is  zero,  lifted 
lines  are  normal  sections,  implying  that  NC  =  C  =  \A\.  Additional  insights  are 
found  in  Cook  (1986),  Wu  and  Luo  (1993a),  Wu  and  Luo  (1993b),  and  Fung  and 
Kwan  (1997).  ,  ,  ^ 

4.3.1     How  to  Use  Local  Assessment  of  Perturbations 

Local  assessment  of  perturbations  can  be  used  to  (1)  assess  the  influence 
of  aspects  of  the  statistical  problem  on  inferences,  and  (2)  compare  the  relative 
sensitivity  of  parameter  estimates. 

First,  we  define  the  local  influence  associated  with  a  length-one  vector  v  to 
be: 

{|m'v|       if  m  7^  0 
(4.21) 
\v'Mv\    if  m  =  0. 

Thus,  li{v)  is  simply  the  directional  derivative  unless  the  gradient  vanishes,  in  which 
case  it  is  the  normal  curvature.  The  direction  of  maximum  local  influence,  Vmax,  is 
the  main  diagnostic  tool,  and  it  can  be  displayed  graphically  using  an  index  plot  or 
star  plot  to  informally  assess  the  influence  of  perturbations.  Additionally,  the  local 
influences  of  basic  perturbations  can  be  plotted  against  case  indices  to  compare  the 
effects  of  one- at-a-time  perturbations.  ;  :. 
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The  relative  sensitivity  of  p  parameter  estimates  can  be  assessed  by  plotting 
the  p  maximum  local  influences  for  rrii  =  Oi^, . . .  ,mp  =  9p^.  Measures  with  the 
largest  maximum  local  influences  are  most  sensitive  to  the  perturbation.   Two 
cautions  with  this  technique  should  be  noted.  First,  the  metric  of  the  parameter 
estimates  ought  to  be  comparable.  If  the  metrics  are  not  comparable,  standardized 
parameter  estimates  should  be  used.   The  second  caution  pertains  to  first-  and 
second-order  influence.  Typically,  the  set  of  p  measures  are  assessed  by  a  first-order 
approximation.    However,  if  a  subset  of  the  measures  requires  second-order 
assessment,  then  the  following  revised  definition  can  be  used  to  make  the  local 
influences  comparable: 

■     f,'^  ■'■'*•-    ■■'■  ■     I    \jh'v\  if  m  7^0 

li*iv)  =  <  (4.22) 

.,;  I    \^v'Mv\    if  m  =  0. 

This  ensures  that  all  influence  measures  are  compared  by  the  approximate  effects  of 
length-one  perturbations.  .        , 

4.4    Algebraic  Expressions  for  Local  Assessment 
In  this  section,  algebraic  expressions  for  local  influence  diagnostics  are 
derived. 

4.4.1     First-order  Assessment 

Theorem  4.1 

The  directional  derivative  in  direction  v  of  influence  measure  m{uj,  ^{9^))  at  the  null 

perturbation  is  given  by: 

S  =  {sl  +  s'^KJ)v,  (4.23) 
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where 


dm{oj,'y{e)) 


K 


d(jj 


«7  = 


dm{uj,-f{9J)) 


di{e^) 


89' 


89^ 


duj' 


and  |o  denotes  evaluation  at  the  unperturbed  fitted  model:  u>  =  cjq  and  0^  =  0. 
Proof  of  Theorem  4.1    Taking  partial  derivatives  of  m  we  find 
dm       dm  duj 


da       duj'  da 

'dm{u:,'r{0)) 


dijj' 
dm{u},'y{0)) 


duj' 


^  dm{i^,j{0^))d-f{0^)  ^^ 

dm{LJ,j{0^))d'y{0^)d0^  , 
e=L  d-r{0J         d0^     9u}' 


Evaluating  at  the  original  model  completes  the  proof.     D 

Corollary  4.1 

The  gradient  is  given  by 

m  =  Sj^  +  J'K's~f. 


(4.24) 


4.4.2     Second-order  Assessment 

In  order  to  make  a  general  case  expression  for  the  acceleration,  a  large 
number  of  partial  derivatives  are  needed.  The  following  lemma  provides  an  identity 
used  to  derive  the  algebraic  expression  for  M. 
Lemma  4.1 


Define  the  following: 


Ti 


d(jjd(jj' 


Gi 


dHiO) 


d0d0' 


„  _    d'L 


dwdijj' 


Then, 


T=[lr^   J']   GJ+[K®   Ig]   P, 
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for 


rqxq 


r2 


G 


rpxp 


G2 


r               ^ 

:* 

Pi 

Ppqxq  = 

P2 

.^p. 

and  where  J  and  K  were  defined  in  Theorem  4.1. 
Proof  of  Lemma  4.1    Consider: 


dijjdu}'         dui 
_d_ 


89, 


do',     duj' 


du: 


7  +  E 


d-YiiO^) 


j=i     dOj^ 


du: 


+ 


de... 


Evaluating  this  equality  at  the  null  perturbation  yields 


Ti  —  J  GiJ 


< 


Stacking  the  Fj  to  form  F  completes  the  proof: 


J'GiJ 


j'GrJ 


+ 


[Ir  ®  J']  GJ  + 


di2{L}T        dy2{L)  T 
:rr      J-n        ttx       ■'■n 


ae 


962^ 


=  [Ir  ®  J']  GJ  +  [K®  Jg]  p.       n 


dwduj' 


dwdu' 


J  0 


(4.25) 
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Theorem  4.2 

The  acceleration  ofinSuence  measure  m{u},  7(^w))  in  direction  v  at  the  null  pertur- 
bation is  given  by 


A 


dM^,7{0^)) 


=  V 


da? 


,d^m{u,MO.)) 


duiduj' 


V 


-h  :-  = 


v'Mv 


(4.26) 


=  V 


Ig       J'K' 


-*9 

KJ 


* 


+  [s;  ®  J']  GJ  +  [s'^K  <^Ig]P\  V, 


where 


dwdu'       l0         dud-yiey     I" 
d"i{e)duj'     I0       d'y(6)d'y(0)<  I0 


(4.27) 


Aiternativeiy,  we  may  write 


^  =  1;' 


=  t; 


J,  j'a:' 


I,    J'K' 


-*9 

KJ 


+  Y^8^Ti       V 


i=l 


\ 


(4.28) 


+  j'[5]s^,G,]j  +  ^(s;ii:,)p, 


j=i 


i=l 


V, 


/ 


where  s^.  is  the  i*^  element  of  s^  and  Ki  is  the  i*^  column  of  K. 

Proof  of  Theorem  4.2    First  we  define  the  following  {q  +  r)  x  1  vector: 

/ 


S  = 


\ 


lie.) 
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Then, 


9^m(a;,7(gJ) 
■:     da' 


d_ 

da 

d_ 

da 

d_ 

da 


dm{8)  d(jj 
duj'    da 
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duj' 
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dudu} 
d_ 
d(jj 

_a_ 
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dm[5)  dS 
dS'    du:' 
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dmjS) 
dS' 


dS 
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fi,.,'  "^  2^ 


du3 


dS'd^mjS)  dS 
duj  dSdd'  duy 

'dS'd'^miS)  dS 


i=l 
g+r 
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r 


dm{S)    d^Sj 
d6i     dijjdijj' 

dm{6)    d^Sj 
dSi     duidu' 


V. 


V. 


,  V^  dmjS)  d'jije^) 


(4.29) 


aw  dSdS'  du;'      -^  57i(^^)  dupdu' 

where  the  last  simplification  occurs  because  d'^cui/divdu}'  =  0.  Examining  the  first 
term  of  d'^Tn{S)/du}du}',  and  evaluating  at  the  null  perturbation  we  find 


dS'  d^m{6)  d6 


du  dddS'  dui' 


dui  dui 


I,    J'K' 


d^mjS) 
dSdS' 


Jo 


dw 

dui' 

du' 


J  0 


dujdw'        l0  dwd-y(eY     l^ 

di(9)du)'    lo     d'y(e)d'r{eY  1° 


(4.30) 
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Examining  the  second  term  of  d^m{6)/du:du}'  and  evaluating  at  the  null 
perturbation,  we  find 


dmjS)  g^7i(^J 


E 


=  [s;  O  /,]  ([/,  ®  J']  GJ  +  [K^  Ig]  P) 
=  [s;  0  J']  GJ  +  [s'^K  ®  /,]  P 


(4.31) 


by  Lemma  4.1.      D 


4.4.3     Comments 


'  While  the  algebraic  expression  for  5  is  quite  simple,  the  expression  for  A 
is  somewhat  cumbersome.  However,  a  second-order  assessment  would  typically  be 
used  only  when  the  gradient  vector  vanishes.  In  most  situations  then,  s-y  =  0,  and 


the  acceleration  matrix  simplifies  to 


M  = 


I,    J'K' 


■*9 

KJ 


(4.32) 


The  algebraic  expressions  are  helpful  in  that  they  decompose  the  diagnostics 
into  building  blocks,  most  of  which  are  immediately  computable.  Specifically,  the 
quantities  8^,  s^,  S,  G,  and  K  should  be  immediate  upon  choosing  an  infiuence 
measure  and  fitting  the  model.  In  the  next  section,  it  is  shown  that  the  matrices  J 
and  P  are  also  easily  obtained  in  most  situations. 

4.5     Computational  Formulas  for  J  and  P 
In  order  for  the  formula  for  the  gradient  and  acceleration  matrix  to  be  of 
practical  use,  they  must  be  easily  computed  in  a  wide  range  of  applications.  In  this 
section,  it  is  shown  that  the  matrices  J  and  P  can  be  computed  even  when  the 
unperturbed  parameter  estimates  are  not  closed-form.  The  only  requirement  is  that 
the  perturbed  estimating  functions  be  closed- form  expressions  of  a; . 
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4.5.0.1     Computation  of  J 

The  following  theorem  generalizes  equality  (14)  in  Cook  (1986). 
Theorem  4.3 
Let  Oij  be  the  solutions  to  the  p  perturbed  estimating  equations  hi^{0;  y)  =  0.  Then, 

J=-h~^A,  -  (4.33) 

where 

dK{e;y) 


A  = 


duo' 
Proof  of  Theorem  4.3    By  assumption,  we  have  the  following  equality 

Taking  derivatives  with  respect  to  cj'  on  both  sides  we  find 


(4.34) 


dK{e-y) 


die' 
where  0  is  now  a.  p  x  q  matrix.  Evaluating  both  sides  at  the  original  model  we  find 

A  +  hJ  =  0, 

and  the  theorem  follows  directly.      D 

Theorem  4.3  has  two  implications  on  computation:  the  gradient  vector 
is  easy  to  calculate,  and  the  acceleration  matrix  is  easy  to  calculate  provided 
s^  =  0.  Neither  h      nor  A  require  that  the  solutions  to  the  estimating  equations 
be  closed-form  for  either  the  perturbed  or  the  unperturbed  inferences.  In  addition, 
the  matrix  h      is  often  computed  as  part  of  the  estimation  process.  Not  only  is 
it  a  by-product  of  using  the  Newton-Raphson  root-finding  technique  to  solve  the 
estimating  equations,  but  it  serves  as  an  estimate  of  the  asymptotic  variance  of  0, 
the  inverse  of  (4.4).  When  estimation  is  performed  by  ML,  -h~^  is  the  inverse  of 
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the  observed  information  matrix.  The  matrix  A  is  also  easily  computed  provided 
that  the  expressions  for  the  perturbed  estimating  functions  are  known.    This 
computation  was  demonstrated  many  times  in  Chapters  1-3  for  maximum  likelihood. 
Computation  of  A  could  be  difficult  when  the  estimating  equations  themselves  are 
not  closed-form.  An  example  would  be  ML  inference  for  generalized  linear  mixed 
models  (McCulloch  1997). 


4.5.1     Computation  of  P 

In  this  section,  a  formula  for  P  is  given  so  that  it,  like  J,  it  can  be  computed 
without  re-estimating  the  model.  To  begin,  we  present  the  following  lemma,  which 
provides  an  identity  used  to  construct  P. 
Lemma  4.2 
Let  0^  be  the  solutions  to  the  p  perturbed  estimating  equations  hcj{9;  y)  =  0. 


where 


-h'i^Ig        P    = 

I,          J' 
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l,...p, 


hi  =   the  i*^  column  of  h 
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(4.36) 
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J  =  -h     A. 
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Proof  of  Lemma  4.2    Taking  the  z*'*  row  of  (4.35)  in  the  proof  of  Theorem  4.3 
gives  the  following  equality: 


dhi^{e\y) 


d(jj' 


dhi^{e^;y)d0^ 


0'. 


e^e^        del     du^ 

where  0'  is  a  1  x  g  matrix.  Taking  derivatives  of  the  first  term  on  the  left  side  of 
this  equality  with  respect  to  a;,  we  find 


(4.37) 
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(4.38) 


Taking  partial  derivatives  of  the  second  term  with  respect  to  u;,  we  find 
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(4.39) 


8u}8u}' 


Using  (4.38)  and  (4.39),  the  next  step  is  to  take  derivatives  of  the  equality  in  (4.37) 
with  respect  to  oj  and  then  evaluate  the  result  at  the  null  perturbation.  It  follows 
that 
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8lj80' 


0=0  Jo 


80^   ,  80,8%^{0-y) 


Q^Q   8u}'       8uj      8080' 


0=0. 


00^ 

8U3' 


E 

j=\ 


dhi^{0^;y)    8di, 


89, 


to,        dujduj',^ 


(4.40) 


=  Qi  +  J'R'i  +  RiJ  +  J'hiJ  +  Y^ 


dhiu{0;y) 


I    J' 


Qi   Ri 
Hi    hi 


+ 


where  0  is  now  a,  p  x  p  matrix.  The  lemma  follows  immediately.      D 


I 
J 


89, 


h:  ®  I, 
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The  theorem  below  now  provides  a  formula  for  P  that  does  not  require 
re-estimation  of  the  model. 
Theorem  4.4        *^         ^  s  : 
Let  0^  he  the  solutions  to  the  p  perturbed  estimating  equations  hu{0;  y)  =  0.  Then, 


P  = 


-1 


h       ®[la     J'] 


* 


/ 
J 


(4.41) 


where 


*  = 


*i 


*. 


(4.42) 


Proof  of  Theorem  4.4   Taking  the  p  sets  of  equalities  of  the  form  given  in 
Lemma  4.2,  we  find  ., 


-hi  (8)  Jq 


hp    ®    Ig 


Manipulation  of  the  left  side  of  the  equality  proceeds  as: 


Ig  J' 

*1 

I, 

J 

• 

/,  J' 

J 

(4.43) 


-^'l  ®  Iq 

h\\Iq      hi2lq 

• 

p  = 

h^llq      h22lq 

-h'p  (2)  Ig 

= 

-h®Ig 

P. 
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Manipulation  of  the  right  side  of  the  equality  proceeds  as: 


I,  J'  \ 

*1 

I, 

J 

r 

lo 

J' 

*1 

r      1 

\ 



L 

J 

h 

r             -] 

r 

-| 

L     J 

J 

I,  J' 

*p 

J 

- 

h 

J' 

[I,    J']  0 

0  [la     J'] 


Ip^ila     J'] 


* 


It  then  follows  from  (4.43)  that 


h^Ir, 


h  ®In 


"W 


Ip^lla    J'] 


* 


* 


1 

^7 

Ip^  I,  J' 

* 

J 

J 

J'] 

* 

I, 

.    n 

-1 

J 

Corollary  4.2 


Pi  =  Y^i-h'^) 


p 


Ig       J' 


*. 


=  Yi-h''){Qj  +  RjJ  +  J'R'j  +  J'hjJ) 
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4.5. 1.1     Remarks 

The  formula  for  P  can  be  combined  with  the  results  of  Theorem  4.2  to 
further  revise  the  acceleration  matrix: 


M  = 


dujduj' 


I    J'K' 


I    J'K' 


I 
KJ 

I 
KJ 


+  [s;  ®  J']  GJ  +  [s'^K  ®  /,]  P 


+  [s;  ®  J']  GJ 


s'^K  ®  Jj 


h~   ®  [  J^    J'] 


* 


I 
J 


I    J'K' 


I 
KJ 


+  [s;  ®  J']  GJ 


+ 


-s'^Kh      ®[T      J' 


* 


I 
J 


Also,  we  may  write 

p 


P 


1=1 


p  p 


/,  J' 

*. 

^. 

J 

=  EEK^^)(-^'') 


j=l  i=l 


/,  J' 


*. 


-*9 

J 


=  Y^{s'^K{-h-;)) 


j=i 


/,  J' 


*, 


(4.44) 


(4.45) 


Hence,  another  form  of  the  acceleration  is  given  by 


M 


I    J'K' 


I 
KJ 


^J'[Y.s„G,]J 


i=\ 


i^^Kh,  ') 

I.    J' 

*i 

I, 

i=i                    L              J 

J 

Note  that  in  all  computations,  J  =  —h     A  as  per  Theorem  4.3. 
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(4.46) 


4.6     Local  Assessment  for  Maximum  Likelihood  Inference 
In  this  section  some  brief  examples  when  estimation  is  performed  by  ML  are 
given.  The  influence  measures  addressed  include  linear  combinations  of  parameter 
estimates,  ratios  of  parameter  estimates,  the  likelihood  displacement,  the  profile 
likelihood  displacement,  and  the  Kullback-Leibler  divergence.  Emphasis  is  placed  on 
constructing  the  building  blocks  of  local  influence.  From  Theorem  4.3,  J  =  —L    A. 
The  individual  measures  determine  7(^0;)  and  the  remaining  building  blocks: 


dmiiuniO)) 


K 


dm{u},'y{e^)) 


36' 


S  = 


dj{e^) 


4.6.1     Linear  Combinations 

Consider  a  linear  combination  of  the  parameter  estimates  under  perturbation: 


^ti>.: 


m  =  xOui. 


The  building  blocks  of  first-order  local  assessment  are: 


7(00,)=    x'K 


s^=    0 


K=    x' 

8r^      =  1. 


Ill 

Construction  of  the  gradient  and  directional  derivative  is  immediate: 

S^rh'v 

—  -x'L    Av. 

Note  that  the  direction  of  maximum  absolute  slope  is  proportional  to  m  oc  A'L  x. 
This  is  exactly  the  same  as  the  direction  of  largest  curvature  for  the  profile  likelihood 
displacement  of  x'O^j  under  the  orthogonal  parameterization  given  in  Theorem  2.1. 

y*         ,.  J,         ■>.     ■  \ 

4.6.2  Ratio  of  Estimates    i«    •    VV  ^'      "^^ 

For  the  measure  m  =  Oi/Oi^^,  the  building  blocks  for  a  first-order  local 
assessment  are 

where  dj  is  a  p  x  1  vector  with  a  1  in  the  i"*  position  and  zeroes  elsewhere.  Hence, 
S  =  rh'v  =  (si  +  s'KJ)v  =  Id'iL'^Av  =  IV'Av, 

"  i  +1,  "  — 1 

where  L  is  the  r"'  column  of  L    . 

4.6.3  Likelihood  Displacement 

Recall  that  the  likelihood  displacement  is  given  by 

m(a;,7(^J)  =  2[L(^;i/)-L(0<,;2/)]. 

This  influence  measure  requires  a  second-order  local  assessment,  as  is  now  shown. 
First,  it  does  not  depend  on  w  explictly,  so  8^^  vanishes.  Since  the  exact  form  of  the 
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likelihood  has  not  been  specified,  we  simply  write  7(^cj)  =  9^.  It  follows  that 


am(a;,7(^J) 


dl{9.) 


dL{e^-y) 


=  -2 
0  56>, 


=  -2 


dL{e-y) 


80 


0. 


Hence,  the  gradient  vector  is  zero.  Proceeding  with  a  second  order  assessment,  we 


first  note  that  K  =  Ip.  It  follows  that 


M  = 


I,    J'K' 


I    J'K' 


I    -A'L 


I    -A'L 


-1 


■^9 

KJ 

I 
KJ 

0  0 

0       0 

0    -21/ 


+  [s;  ®  J']  G J  +  [s'^K  ®  Jj  P 


L    A 


-L    A 


=  A'L  \-2L)L  ^A 
^-2A'L~^A. 


This  is  equivalent  to  the  result  given  in  Section  2.1,  as  per  Cook  (1986). 

4.6.4    Profile  Likelihood  Displacement 

The  profile  likelihood  displacement, 

m(a;,7(0J)  =  2[L(^i,^2;y)  - /^(^w,^(^iu.);2/)], 

also  requires  a  second-order  assessment.   First,  the  influence  measure  does  not 
depend  on  a;  explictly,  so  s^j  vanishes.  Also, 


O-y     


am(a;,7(0.)) 


di{e^) 


-2 


aL(7(^J);y) 


di{e^) 


-2 


dL{e-y) 


30 


=  0. 


This  implies  that  the  gradient  vector  is  zero. 
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The  next  step  is  to  construct  the  building  blocks  of  a  second-order  assessment. 
In  order  to  obtain  a  formula  for  K,  we  present  the  following  result,  which  appeared 
in  Cook  (1986)  as  equation  (24): 

.  dg{9,) 


de\ 


—  Llryn   Ll 


22  ■'^21 


where 


X  = 


L21    L22 


Using  this  result,  we  find 


K 


dim 


dO,. 


d9i 


I         0 

—  1/22  -^21      0 


(4.47) 


(4.48) 
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It  follows  that  the  acceleration  matrix  for  the  profile  likelihood  displacement 


IS 


M^'^'  = 


I    J'K' 


I    J'K' 


I    J'K' 


I    J'K' 


=  -2J'K'LKJ 


KJ 


+  [s;  ®  J']  GJ  +  [s'^K  0  Ig]  P 


0  0 

"  d')(e)di{9)'  10 

0  0 

'-'  flOflOl 


/ 

KJ 


d9d9'     10 


/ 

KJ 


0       0 

0    -2L 


I 
KJ 


-2A'L 


=  -2A'L 


0  0 


■t'll      ■£'12 

L21    L22 


/.,  0 

22  ^21 


-1/99   Loi         0 


Z-^A 


-t'li  —  L12L22  L21    0 
0  0 


L     A 


-2A'{L  '  - 


0      0 

0    L22' 


)A. 


This  is  equivalent  to  the  formula  given  in  Section  2.1,  as  per  Cook  (1986). 

4.6.5     Kullback-Leibler  Divergence 

The  Kullback-Leibler  divergence, 


m  —  Ea 


log(i(?^) 
/(W|9J 
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also  requires  a  second-order  assessment.   The  measure  does  not  depend  on  u; 
explictly,  so  s^  vanishes.  Also,  7(^0;)  =  0^,  implying 

dm{u;,-f{e^)) 


di{e^) 


=  — ttEo 

del 

dE, 


f{y\e.)' 


oe' 


--Ee 


dL{e^-y) 


de, 


-Ee  [0] 


0. 


Proceeding  with  a  second-order  assessment,  we  first  note  that  K  =  Ip.  We 


also  make  the  following  observation: 


•va(-. 


— ; TrEa 


f{y\e^y 


— : ttEs 


L{e^\y) 


■    .  .   -.    ■  =-^ 

Evaluating  this  at  the  null  perturbation  implies 

'd^L[0-y) 
ddde' 


(4.49) 


'-'77  ~     '^e 


Jo 


and  the  acceleration  matrix  follows: 


I    J'K' 


I    -A'L 


=  -AT'    E 


/ 
KJ 

0      0 

0   s 


77 


-L     A 


(4.50) 


d'L{0;y) 
8080' 


l-'a. 


116 


An  interesting  consequence  of  this  result  is  the  following:  if  the  observed  and 
expected  information  matrices  are  equal,  then  the  acceleration  matrix  is  proportional 
to  the  acceleration  matrix  of  the  likelihood  displacement. 

4.7    First-order  Assessment  of  the  Log  Generalized  Variance 
In  this  section,  some  general  results  for  first-order  local  assessment  of  the  log 
of  the  generalized  variance  are  presented.  In  addition,  the  directional  derivative  for 
the  variance  perturbation  of  a  regression  model  under  ML  is  derived. 

4.7.1     General  Form  of  the  Directional  Derivative 
Theorem  4.5 
For  the  measure 


m  =  log 


the  A;*''  element  of  the  vector  s^  is  given  by 

'd'K{0;y) 


—trace 


h 


duJkdO' 
and  the  f^  element  of  the  vector  8^  is  given  by 


k  =  l,...q, 


—  trace 


Additionally,  we  may  write 


dejdo' 


h 


j  =  l,...p. 


s^  =  R{-h    )  = 


ill      -^2      •  •  •      Rp 


vec 


(-^     ), 


where  each  Rj  is  a  q  x  p  matrix  of  the  form 


Rj  — 


3  =  l,-P- 


(4.51) 


(4.52) 


(4.53) 


(4.54) 
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Similarly, 


8-y  =  h{—h    ) 


hi    /i2  hp 


vec{—h     ), 


(4.55) 


where  each  hi  is  a  p  x  p  matrix  of  the  form 


d9d0' 

Note  that  the  matrices  Ri  and  hi  appeared  earlier  in  the  computational 
formula  for  P  (Theorem  4.4). 
Proof  of  Theorem  4.5 

To  begin,  we  present  the  following  two  matrix  differentiation  results  for  a  matrix  A 
and  a  scalar  t  (Searle  et  al.  1992): 


dAT^  _  _^-.dA 

dt    ~    ^    dt^ 


d_ 
dt 


log  |A|  =  trace 


.idA 

'dt 


(4.56) 
(4.57) 


Combining  these  two  identities  yields 
d 


dt 


log  \  —  A  ^  I  =  trace 


—  trace 
=  —trace 


(-A-')-'l(-A-') 


dA 
df 


(4.58) 


Using  (4.58)  we  find 
.       d 


dui 


m{u},  0u)  =  —trace 


d^K{e-y) 


dcjkde' 


dK{0^;y)  ^^ 


6=9^ 


dO' 


y 


and  evaluating  at  a;  =  wq  and  6  =  0  yields  (4.52).  Expression  (4.53)  follows  from  a 
similar  argument. 
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The  next  step  is  construction  of  (4.54).  First,  the  trace  of  product  of  two  matrices 
Arxc  and  Bcxr  is  (Searle  1982) 


Letting  A*-'  denote  the  ij*''  element  oi  h     ,  we  may  write  (4.52)  as 


p     p 


dhU0;y) 


p     p 


h'K 


Construction  of  s^  follows: 


«a,  =  - 

P        P 

1=1  j=i 

3hi,^{e;y) 

aK,i(6\y) 

dhi,^{e;y) 

dhx,^{e;y) 
dOpUJi 

a/i2,u,(e;v) 

deiuji 

dhiAOw) 

aepui 

dhpMOw) 

d6p(jj-i 

= 

dhi,^(e;y) 
dBiuj2 

dhi,^{e;y) 
d9pUi2 

'"  <-^' 

■ 

dOlUlq 

dhi,^{e;y) 

dOpUlg 

dh2,u(e;y) 

deiUq 

dh2,u,(6;y) 

dBpUq 

dhp,^(0;y) 

dOpUlq 

vec{~-h    ) 


ill     R2     •  •  •     Rp 


vec{—h    ). 


Construction  of  s^  proceeds  similarly.      D 

Corollary  4.3 

For  the  measure  (i.bl),  the  slope  in  direction  v  is 


-1, 


s  =  vec{-h  y 


R'  +  hJ' 


V, 


and  the  direction  of  maximum  absolute  slope  is 


(4.59) 


Umax  <X  m  = 


R  +  Jh 


vec{h     ) . 


(4.60) 
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4.7.2     Variance  Perturbation  for  ML  Regression 

In  this  section,  first-order  local  assessment  diagnostics  are  derived  for  the 
variance  perturbation  in  ML  regression  with  the  perturbed  generalized  variance 
as  the  objective  measure.   Because  h      has  a  simple  form,  we  derive  s^^  and  s^ 
element-by-element  rather  than  construct  the  matrices  in  (4.54)  and  (4.55). 
4.7.2.1     Construction  of  8t^ 

To  derive  the  i*^  element  of  s^^  we  begin  with  the  following  results  from 
Section  2.5: 

d^K{e-y)  _  1 


dl3dui 
d'LUO;y)        1 


;{yi  -  x'i0)xi 


'/3\2 


{yi-x[0) 


(4.61) 
(4.61a) 


where  Xj  is  the  covariate  vector  for  the  i*''  observation.  Taking  partial  derivatives  of 
(4.61)  leads  to 


d'LUe;y) 
d^d/3'duji 


xnp-i)Xi 


r.XiX^ 


Taking  partial  derivatives  of  (4.61a)  yields 

d'L^{e;y)  1 


'  a\2 


{Vi  -  <I3) 


After  evaluating  these  expressions  at  the  null  perturbation,  we  can  construct  the 


matrix 


dK{9-y) 


duide' 


d'L^{0;y) 


dedd'dui 


-  ~2XiXi 
r;       / 


v4  "^i 


-s 


(4.62) 
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Accordingly,  the  trace  of  the  following  matrix  multiplication  is  the  i^^  element  of  s^^: 


dK{0;y) 


dujide' 


{-h    )  = 


--^y 
»*^i 


a^x'xy^      0 

0'  2(TVn 


XiX^{X  X)  "^"^i 


^x[{X'X) 


lV\-l         2r^ 


(4.63) 


In  order  to  find  a  formula  for  the  trace,  an  expression  for  the  fc"'  diagonal  element 
of  the  upper  left  portion  of  the  matrix  is  needed.  Letting  Ujk  be  the  jk*^  element  of 
{X'X)-\  the  A;*''  diagonal  element  of  Xix'iiX'Xy^  is 

p-i 

'•}■■ 
•  .  j=l  ■  *  . 


Hence,  the  trace  of  (4.63)  is  given  by 

p-i  P-i  2r^  ""^       ^~^  2r? 

fc=i  j=i  j=i        fc=i 

From  this,  the  vector  s^j  can  finally  be  constructed: 


■vp  — 1 


2rH 


(Xy7=l  ^nj  Zjfc=l(%fe^nfe)j  +  „o.2 


X21X2 

an 

1-, 

— 

"21 

_  a2(p-i)  _ 

%;K   \ 


Xi(p-i)Xi 


Xn{p-l)X^ 


a(p-i)i 


a(p-i)(p-i) 


na 


2    «9 


(4.64) 


Letting  a,  be  the  i^^  column  of  (X'X)   ^,  it  follows  that 
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r                       -| 

r                      -1 

XuX[ 

^i2a:i 

3:21X2 

ai  - 

3^21  aj'a 

a2.. 

.  — 

_  a;„ix;  _ 

_  a;„2<  _ 

X2{p-l)X2 


^n(p-l)25„ 


flp-1 


ncr 


2'  «9 


XiiX'i 


3:21X2 


a:i(p-i)Xi 

3:2{p-l)X2 


3:nlX„      .  .  .     Xn(p-i)Xj^ 


tti 


ttp-i 


no 


2'  «9 


£)(Xi)X    ...    D(Xp_i)X 


vec(X'X)-^  -  — -r 


ncr 


2'  «9 


D(Xi)    ...    D(Xp_i) 


[Vi®X]vec(X'X)-^ 


no" 


2'  «9- 


4.7.2.2     Construction  of  Sy 

Starting  from  the  Hessian  of  the  unperturbed  log-likelihood, 


dh{e;y)  _dL{e;y) 


d0' 


dOdO' 


-ix'x 


-^(y-X/3) 


■My  -  x/sy     ^-Mv-  ^^Yiy  -  x/3) 


(4.65) 


the  following  partial  derivatives  at  the  null  perturbation  can  be  computed: 

d'Lie^y) 


d'L{0;y) 


=  0 


da^d^dlS' 
d'L{0;y) 


Xx'x 


d'HB;  y) 


=  -.X'iy  -  X/3) 


=  ttXV  =  0 


d{a^y 


-^e  +  ~s(y-xf^ny-x^) 


n       Sr'r       2n 

a^        (T*         a^ ' 
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Now  the  elements  of  Sj  can  be  constructed: 


dm{u),  e^) 


d/3i 


=  trace ( 


d'L{0-y) 


=  trace ( 

=  trace ( 
=  0. 


d/3id0d0' 
0 


i-h-  )) 


1    v' 


X'Xi 


1  vi 


'-.x\x       0 


0 


j,X',X{X'X)-' 


<t2(X'X)-i      0 
0' 


2£l 

n 


"^X'Xi 

•n  ' 


0 


) 


Also, 


dm{(jj,  0^) 


da^ 


—  trace ( 
=  trace ( 

=  trace ( 
p  +  3 


d'L{e;y) 


da^dode' 

i(X'X)     0 


i-h")) 


0' 


^I     0 


0'     4 


2n 


a2(X'X)-i      0 


0'  ^ 

n 


(4.66) 


(4.67) 


Hence,  all  but  the  last  element  of  s^  is  zero: 


tJ/y     


0 
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4.7.2.3    Directional  derivative 


Using  the  results  above  and  A  from  Section  2.5,  we  may  finally  construct  the 
directional  derivative  of  the  log  of  the  perturbed  generalized  variance 

S={8l  +  s'KJ)v 


=  {8l  +  s'K{-L-)A)v 


-K  + 


=  «  + 


0'      2+3 


Q,       {p+Z)a^ 


a2(X'X)-i      0 
0' 


251 

n 


A)v 


j,X'D{r) 


2&*     sq 


)V 


(-[D(Xi)    ...    D(Xp_i)][Vi®^]vec(A-'X)-i 


\T..^ 


^n'   sq      '  ~  9 


p  +  3       , 
rsq)v 


=  {-[D{X,)    ...    D(X,_0][Vi®^]vec(X'X)-^  +  ^r,,)V 


Hence,  the  direction  of  the  maximum  absolute  slope  is 


(4.68) 


Umax  oc  -[  D(Xi)    . . .    r>(Xp_i)  ]  [Ip-i  ®  X]  vec(X'X)-^  +  ^^r,,.      (4.69) 


4.7.2.4    Simple  random  sample 

As  a  special  case,  consider  a  normally  distributed  simple  random  sample 
(SRS).  The  direction  of  maximum  slope  simplifies  to 


1  3 

Vmax  OC  m  =  -£>(!„)  [1  ®  1„]  vec(-)  +  ;jT^r«, 


n        na' 


oc 


fsq  „   In 


(4.70) 


Hence,  observations  with  squared  residuals  that  deviate  largely  from  ^  have  large 
elements  in  Umax-  This  may  seem  odd,  but  recall  that  this  represents  influence  on 
an  estimated  joint  asymptotic  confidence  region  for  /i^^  and  a^.  If  the  log  of  the 
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Figure  4.1:  Hills  data;  Vmax  for  perturbed  log  generalized  variance  of  0 

variance  of  these  two  estimates  is  considered  separately,  the  directions  of  maximum 
absolute  slope  are  respectively  given  by 


»^niax  CX  T 


sq 


a^r 


^max  0^  ' 


sq 


The  direction  of  maximum  slope  for  (1^  is  dominated  by  squared  residuals  that 
deviate  greatly  from  the  average  squared  residual,  a^,  while  Umax  for  a^  is  dominated 
by  large  squared  residuals.  The  direction  of  maximum  slope  for  the  joint  influence 
measure  is  simply  a  weighted  average  of  these  two  vectors. 
4.7.2.5     Examples 

As  examples,  we  consider  the  Scottish  hills  data  and  the  school  data.  For 
the  hills  data,  the  element  of  Umax  for  observation  1 1  is  large  and  the  element  for 
observation  18  is  huge  (Figure  4.1).  For  the  school  data,  elements  3  and  18  show 
substantial  influence  on  the  perturbed  generalized  variance  (Figure  4.2). 
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Figure  4.2:  School  data;  fmax  for  perturbed  log  generalized  variance  9 


4.7.2.6    A  simpler  measure 

The  directional  derivative  simplifies  greatly  if  the  measure  is  based  on  the 
unperturbed  generalized  variance  evaluated  at  the  perturbed  estimates: 


m  =  log 


'    ae'     ' 


In  this  situation,  s^  vanishes,  but  the  other  derivations  follow  from  above,  yielding 
the  following  results  for  the  variance  perturbation: 


c      P  +  3   , 


na 


2       »9 


'-'max 


p  +  3 


''^max  CX  T 


sq- 


4.8    Local  Assessment  for  Quasi-likelihood  Inference 
In  this  section,  local  influence  assessment  of  quasi-likelihood  estimates 
(Wedderburn  1974)  for  independent  observations  is  outlined.  A  thorough  treatment 
of  quasi-likelihood  inference  can  be  found  in  McCullagh  and  Nelder  (1989).  Here,  we 
present  only  some  fundamental  concepts,  viewed  from  the  vantage  point  of  optimal 
estimating  equations  (Godambe  and  Heyde  1987;  Desmond  1997a).  Some  local 
influence  techniques  are  then  outlined. 
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4.8.1  Quasi-likelihood  ... 
Consider  a  model  in  which  a  set  of  n  independent  observations  y  depends 

on  a  set  of  p  parameters  0  through  their  means  ti{0).  Although  full  distributional 
assumptions  are  not  made,  we  also  assume  that  the  variances  of  y  depend  on  their 
means:  V[j/]  =  a'^V{n).  A  class  of  estimating  functions  that  we  may  consider  for 
estimating  0  are  given  by 

h*i0;y)  =  W{0){y-fi{0))y,  (4.71) 

where  the  p  x  n  weight  matrix  W{0)  can  be  a  function  of  the  unknown  parameters. 
Standardizing  (4.71)  as  in  (4.3)  yields  (Wedderburn  1974;  Godambe  and  Heyde 
1987): 

h{e;  y)  =  %V{t^)-\y  -  t^m/<^'-  (4.72) 

These  optimal  estimating  functions  are  called  the  quasi-score  functions  because 
the  solutions  to  h{0;y)  =  0  maximize  the  following  "quasi" -likelihood  function, 
provided  the  integral  exists:  • 

4.8.2  Local  Assessment 

Some  preliminary  results  for  local  assessment  of  quasi-likelihood  models  for 
independent  data  are  presented  in  this  section.  Assumptions  about  the  model's 
form  are  presented,  some  perturbation  schemes  are  proposed,  and  a  case-weight 
perturbation  is  applied  to  an  example  data  set. 
4.8.2.1     Assumptions 

It  is  assumed  that  observations  are  independent,  and  that  the  variance  is 
related  to  the  mean  by  the  function  V(/i).  In  addition,  it  is  assumed  that  the  mean 
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is  related  to  a  linear  predictor  77  =  x'fi  through  a  link  function,  g: 


gilJ')  =  V- 


4.8.2.2     Perturbation  Schemes 

One  perturbation  is  to  introduce  case- weights,  or  equivalently,  variance 
perturbations.  A  set  of  n  perturbations  can  be  introduced  into  the  estimating 
functions  in  a  multiplicative  manner: 

hU0;y)  =  ^D{u:)V{fj,)-'{y-t^{e))/a\ 

Writing  the  j'*''  perturbed  estimating  function  in  summation  notation  shows  how  the 
perturbations  weight  the  contributions  of  the  individual  observations: 


The  matrix  A  is  then  given  by 

dhUe;y) 


A  = 


d(jj' 


1  ^^^'.DiV^) 


0      a^dO      'V{^i) 

where  D{^^^)  is  a  diagonal  matrix  with  i*^  diagonal  element  {xji  -  Hi)/V{fii). 

If  the  response  is  continuous,  then  a  perturbation  can  be  made  to  assess 
additive  errors  in  measurement.  The  perturbed  estimating  functions  are  then 


KiO;y) 


'de 


V{n)-^y  +  u;-n{0))y, 


which  leads  to 


1  dK{e-y)  _  dfi' 
~  dO 


du}' 


v{n-)- 


Other  perturbations  can  also  be  introduced,  including  errors  in  the  explanatory 
variables  and  perturbations  to  the  linear  predictor. 
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4.8.2.3     Example:  Case-weights  Applied  to  Leaf  Blotch  Data 

As  an  example,  we  apply  the  case-weight  perturbation  to  a  data  set 
concerning  leaf  blotch  (Wedderburn  1974;  McCullagh  and  Nelder  1989).   The 
percentage  of  leaf  area  affected  by  the  blotch  was  recorded  for  10  varieties  of  barley 
grown  at  9  sites.  Although  the  response  is  continuous,  it  can  be  analyzed  as  if  arising 
from  binomial  sampling  using  the  logit  link  function  and  a  main  effects  model: 

V{fii)  =  Hi{l  -  fjLi)        gifii)  =  log  — -^—  =  T]i. 

Plotting  the  Pearson  residuals  from  this  model  against  the  linear  predictor  indicates 
that  the  model  overestimates  the  variance  for  the  largest  and  smallest  proportions 
(Figure  4.3).  Applying  the  case  weight  perturbation  requires  h      and  the  following 
matrix:  -    ,.  .  ,    '  ,• 

A  =  ^X'D{y-ii), 

where  X  is  the  model  matrix  and  D{y  —  fi)  is  a  diagonal  matrix  of  the  raw  residuals. 
To  measure  the  effects  of  perturbation,  we  use  the  unperturbed  Pearson  Chi-squared 
statistic  evaluated  at  the  perturbed  MLEs: 


m 


=  E 


This  implies  that  the  final  building  block  for  a  first-order  assesment  of  this  measure 
is  given  by 

dm 


^''      80 


=  X'z,, 


where  the  n  x  1  vector  Zi  has  z*''  element 


2y,/if  +  yf(l-2/i,)-/^^ 

Ai(i-Ai) 
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The  direction  of  maximum  local  influence  is  then 

Vraay.CC  D{y-  ij,)Xh      X' Zi- 

Plotting  the  elements  of  Vmax  against  case  indices  indicates  that  elements  24  (site 
3  variety  4)  and  65  (site  7  variety  5)  are  influential  (Figure  4.4).  When  plotting 
elements  against  the  corresponding  linear  predictor,  it  is  clear  that  observations 
with  moderately-sized  proportions  are  more  influential  (Figure  4.5). 

As  in  Wedderburn  (1974)  and  McCullagh  and  Nelder  (1989),  we  consider 
fitting  a  model  with  an  alternate  variance  function,  V{ijl)  =  /i^(l  — /x)^.  This  function 
reflects  the  assumption  that  extreme  proportions  are  more  precise  than  those  arising 
from  a  binomial  distribution.  Pearson  residuals  indicate  that  the  alternate  variance 
function  is  more  realistic,  but  that  a  relationship  between  the  residual  variance 
and  the  the  linear  predictor  remains  (Figure  4.6).  Implementing  the  case- weight 
perturbation  and  assessing  its  effects  on  the  Pearson  Chi-square  statistic  requires 

the  following  quantities:  ■:  ^ -, 

^'  '-'■       '■'  -  "''  .■-  ■; 

0-2  /i(l  -  /x) 


and 


dm 


X'Z2, 


0 


where  the  n  x  1  vector  Z2  has  i*''  element 

2{fif  -  Sfj-fVi  +  ili{2yf  +  yj)  -  yf) 

The  direction  of  maximum  local  influence  is  then 


t^max  OC  D{^^—^)Xh    ^X'Z2. 
/i(l-/x) 
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Figure  4.3:  Leaf  blotch  data;  Model  I;  Pearson  residuals  vs.  linear  predictor 

A  plot  of  Umax  indicates  that  only  observation  24  is  influential  for  this  model  (4.7). 
The  elements  have  less  association  with  the  linear  predictor  than  the  binomial  model 
(Figure  4.8),  indicating  that  the  contribution  of  the  observations  to  inference  is  more 
appropriate  in  the  second  model. 
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Figure  4.4:  Leaf  blotch  data;  Model  I;  Vmax-  The  first  ten  indices  correspond  to  site 
1,  varieties  1-10,  while  the  second  ten  indices  correpsond  to  site  2,  varieties  1-10  etc. 
The  dotted  vertical  lines  are  present  to  help  differentiate  sites  and  varieties  within 
sites. 
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Figure  4.5:  Leaf  blotch  data;  Model  I;  Umax  vs.  linear  predictor 
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Figure  4.6:  Leaf  blotch  data;  Model  II;  Pearson  residuals  vs.  linear  predictor 
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Figure  4.7:  Leaf  blotch  data;  Model  II;  Umax-  The  first  ten  indices  correspond  to  site 
1,  varieties  1-10,  while  the  second  ten  indices  correspond  to  site  2,  varieties  1-10  etc. 
The  dotted  vertical  lines  are  present  to  help  differentiate  sites  and  varieties  within 
sites. 
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Figure  4.8:  Leaf  blotch  data;  Model  II;  Umax  vs.  linear  predictor 


CHAPTER  5 
LOCAL  ASSESSMENT  OF  PREDICTIONS 

In  this  chapter,  local  assessment  in  prediction  problems  is  considered.  First, 

prediction  is  motivated  by  two  methods:  first  principles  and  predictive  likelihood. 

This  is  followed  by  a  brief  discussion  of  influence  measures  for  estimation  and 

prediction.  Then,  predictive  inference  and  existing  local  influence  techniques  for 

linear  mixed  models  are  reviewed.  These  techniques  are  then  extended  to  assess  the 

effects  of  perturbations  on  predictions,  and  three  perturbations  schemes  are  applied 

to  an  example  data  set. 

5.1     Prediction  via  First  Principles 
Consider  the  problem  of  predicting  a  vector  of  random  variables,  u. 
Among  predictors  u*,  u  —  E(m)  minimizes  the  mean  squared-error  of  prediction, 
MSEP(m*)  =  E(m  —  w*)^.   The  standard  error  associated  with  this  prediction 


is  y/MSEP(w)  =  y/Y{u).   However,  prediction  is  typically  not  so  simple;  it  is 
usually  performed  using  information  from  a  sample  and  in  the  presence  of  unknown 
parameters. 

5.1.1     Prediction  After  Data  are  Observed  ? 

Suppose  now  that  prediction  of  u  takes  place  after  observing  data  y.  If 
the  two  random  vectors  are  correlated,  then  an  improved  inference  can  be  made 
about  u.  Here,  the  conditional  mean,  ti  =  E(w|y),  can  serve  as  a  point  prediction. 
This  prediction  minimizes  the  conditional  mean  square  error  of  prediction, 
CMSEP(m*)  =  E(u  -  u*\yy.  Since  it  minimizes  the  CMSEP  for  given  y,  it  also     . 
minimizes  the  (marginal)  MSEP.  This  best  predictor,  ii,  differs  from  u  in  that  it  is  a 
random  variable,  rather  than  a  constant.  Also,  the  best  predictor's  MSEP  can  be  no 
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larger  than  that  of  it: 

MSEP(m)  =  V(w) 

=  E(V(M|y))+V(E(«|?/)) 

>E{V{u\y)) 

(5.1) 

=  E{E{{u-u)'\y)) 
=  E(CMSEP(u)) 
=  MSEP(w). 

Other  predictors  can  be  derived  if  the  class  of  predictors  is  restricted.  For 
instance,  minimizing  MSEP  among  predictors  that  are  linear  functions  of  y  results 
in  the  best  linear  predictor  (Searle  et  al.  1992): 

BLP{u)^E{u)  +  CV-\y-E{y)),  (5.2) 

where  V  is  the  variance  of  y  and  C  is  the  covariance  of  u  and  y.   Although 
not  required  by  its  definition,  the  BLP  is  an  unbiased  predictor  of  u  since 
E{BLP{u))  =  E{u). 

5.1.2    Prediction  After  Data  Are  Observed,  but  with  Unknown  Parameters 

Typically  the  goal  of  prediction  is  accompanied  by  a  need  to  estimate 
unknown  parameters  0.   Here,  we  may  first  estimate  the  parameters  using  the 
observed  data  and  then  estimate  the  best  predictor.  The  MSEP  of  the  estimated 
best  predictor  can  then  serve  as  a  measure  of  uncertainty  in  the  prediction,  although 
Booth  and  Hobert  (1998)  argue  that  CMSEP  is  more  relevant. 

To  illustrate,  consider  the  normal  linear  regression  model  with  unknown 
parameters,  /3,  but  with  known  variance,  a^.   After  observing  data  y,  the  goal 
is  to  predict  future  independent  observations  u  ~  N{Z^,a^Im).    From  either 
ordinary  least  squares  or  ML,  0  =  (X'X)-'^X'y.    Hence,  the  estimate  of 
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E{u\y)  =  E(m)  is  Z^,  and  its  mean  squared  error  of  prediction  can  be  shown  to  be 
a^{Z'{X'X)~^Z  +  Im)-  Also,  Z^  is  normally  distributed.  These  facts  can  be  used 
to  create  a  100(1  —  a)%  prediction  interval  for  a  single  it  with  covariate  vector  z: 

z'3  ±  Za/2 ^J<r^z'iX'X)-'z  +  l).  (5.3) 

In  the  event  that  a^  is  unknown,  the  prediction  interval  given  in  (5.3)  is 
still  valid  but  cannot  be  calculated.  One  option  is  to  simply  replace  a"^  by  s^  to 
produce  an  approximate  prediction  interval.  A  better  alternative  is  to  make  use  of 
the  following  pivotal  quantity: 

u  -  z'^ 


tn-fc, 


(5.4) 


y/s^{z'{X'X)-'z  +  l) 
where  k  is  the  number  of  regressors.  An  exact  100(1  —  a)%  prediction  interval  is 
then 

z'0  ±  t„^k,a/2\JsHz'{X'X)-^z  +  1).  (5.5) 

This  simple  situation  highlights  the  fact  that  unknown  parameters  can 
complicate  the  prediction  problem.  While  pivotal  quantities  may  be  available  in 
simple  situations,  accomodating  unknown  parameters  may  be  more  cumbersome 
in  complicated  models.  Therefore,  attempts  have  been  made  to  unify  predictive 
inference  via  predictive  likelihood. 

5.2    Predictive  Likelihood 
Predictive  likelihood  is  a  likelihood-based  framework  for  handling  the 
problem  of  prediction.  The  goal  is  to  predict  w  via  a  point  or  interval  predictor  in 
the  presence  of  unknown  parameters  0.  If  u  is  treated  as  a  vector  of  parameters, 
then  the  joint  likelihood  of  u  and  0  is 

L*{u,9-y)  =  f{y,u\e)  =  h{u\e,y)h{y\e).  (5.6) 
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A  predictive  likelihood  for  u  would  result  by  somehow  eliminating  0  from  the  above 
expression.  For  Bayesians,  this  can  be  achieved  in  a  straightforward  manner.  The 
posterior  density  of  0  given  the  data  y  is  obtained,  and  then  (5.6)  is  integrated  over 
the  density  of  0.  The  resulting  expression  is  a  posterior  predictive  likelihood  for  u. 
For  frequentists  there  are  several  competing  options,  as  outlined  in  Butler  (1986) 
and  Bj0rnstad  (1990).  Here  we  consider  two:  the  estimative  predictive  likelihood 
and  the  profile  predictive  likelihood. 

5.2.1  Estimative  and  Profile  Predictive  Likelihood 

The  estimative/empirical  predictive  likelihood  replaces  0  in  (5.6)  by  0,  its 
MLE.  The  resulting  predictive  likelihood  is  actually  proportional  to  the  density  of  u 
given  y  with  the  MLE  inserted  in  place  of  0.  This  is  considered  a  naive  approach,  as 
resulting  inferences  for  u  do  not  take  into  account  the  uncertainty  in  the  estimated 
parameters.  An  alternative  frequentist  predictive  likelihood  can  be  motivated  by 
profile  likelihood.  The  idea  works  as  follows.  If  0  is  considered  a  set  of  nuisance 
parameters,  then  a  technique  for  estimation  is  to  first  maximize  the  joint  likelihood 
of  {u,0)  with  respect  to  0  for  fixed  u.   The  resulting  solution,  0{y,u),  is  then 
substituted  into  the  likelihood.  The  resulting  expression  is  a  function  of  only  u  and 
y,  and  is  known  as  the  profile  predictive  likelihood  of  u  (Mathiasen  1979;  Lejeune 
and  Faulkenberry  1982;  Levy  and  Perng  1984). 

5.2.2  Point  and  Interval  Predictions 

Regardless  of  which  predictive  likelihood  is  chosen,  point  and  interval 
predictions  are  of  interest.   One  option  for  a  point  prediction  is  to  maximize  the 
predictive  likelihood,  £{u;  y)  with  respect  to  u.  However,  the  resulting  maximum 
likelihood  predictor  can  have  poor  properties  (see  Bj0rnstad,  1990,  examples  4  and 
7) .  An  alternative  can  be  motivated  by  standardizing  the  predictive  likelihood  to  be 
a  predictive  density  for  u:  f{u\y)  =  £{u;  y) / f2{y) .  The  mean  of  this  conditional 


density,  the  predictive  mean,  is  the  preferred  point  predictor.  Standard  errors  and 
prediction  intervals  can  also  be  constructed  from  this  predictive  density. 
5.2.2.1     Example 

Consider  again  the  situation  of  predicting  future  observations  in  a  linear 
regression  model  with  0'  =  {0',(t^)  unknown.   Naively  replacing  0  by  0  in  the 
conditional  distribution  of  u  given  y  yields  the  estimative  predictive  likelihood. 
Hence,  inference  proceeds  under  the  assumption  that  u  is  conditionally  distributed 
as  normal  with  mean  Z^  and  variance  a'^Im-  A  prediction  interval  for  a  single  u 
with  covariate  vector  z  would  then  be 

In  contrast,  the  conditional  distribution  of  u  from  standardizing  the  profile 
predictive  likelihood  is  multivariate  t  with  mean  Z0  and  dispersion  parameter 
na^{I  +  Z'{X'X)~'^Z),  as  shown  by  Levy  and  Perng  (1984).  A  prediction  interval 
for  a  single  u  would  then  be 


z'^±tr,-pa^Jl  +  z'{X'X)-'z. 

This  matches  the  exact  prediction  interval  given  in  (5.5). 

5.3     Influence  Measures  for  Prediction 
In  this  section,  some  general  measures  are  proposed  for  assessing  the  effects  of 
perturbation  on  point  estimation  and  point  prediction.  Notation  is  as  follows:  $  are 
point  estimates  for  parameters  0  and  u  are  point  predictions  for  random  variables 
u.  Also,  let  ^^  and  u^  denote  the  perturbed  versions  of  these  point  inferences,  and 
let  PL{u;y,0)  denote  the  log  of  the  unstandardized  predictive  likelihood. 
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5.3.1     Predictive  Likelihood  Displacement 

An  overall  measure  of  the  change  in  estimates  0  and  predictions  u  due  to 
perturbation  is  the  predictive  likelihood  displacement: 

PLDiLj)  =  2[PL{u;  y,  ^)  -  PL{u^-  y,  0J].  (5.7) 

A  local  assessment  of  this  influence  measure  could  be  first-  or  second-order, 
depending  on  the  model  and  the  method  of  inference. 

Thomas  and  Cook  (1990)  considered  influence  on  predictions  in  generalized 
linear  models  (GLMs).  To  explain,  let  u  denote  a  single  future  observation  that 
is  independent  of  the  observed  data.  Also,  let  f{-\0)  denote  the  density  function 
shared  by  the  observed  data  and  the  future  observation.  Thomas  and  Cook  (1990) 
proposed  the  following  influence  measure  for  assessing  the  eflfects  of  perturbations 
on  predicting  u:  >  ,  t  / 

d*{u:)  =  \og f{u\0) -log f{u\0J.  (5.8) 

In  the  context  of  the  article,  parameters  were  estimated  by  ML  and  point  predictions 
were  u  =  E{u\y,i3).   For  ML  estimation  of  a  GLM,  d*{u})  is  non-negative  and 
minimized  at  the  null  perturbation,  indicating  that  a  second-order  local  assessment 
is  appropriate.  However,  in  other  situations  it  can  be  negative. 

The  influence  measure  d*{u})  is  motivated  by  a  deviance  residual  for  the  i*'' 
observation  (McCuUagh  and  Nelder  1989): 

^ogfivM  =  Vi)  -  log  f{yi\ni  =  fii0)),  (5.9) 

where  E{yi)  —  /Xj  and  fli  is  the  estimated  mean  of  the  observation  based  on  the 
model.  In  the  context  of  GLMs,  each  deviance  residual  is  non-negative,  with  smaller 
values  indicating  a  better  fit.  The  deviance  of  a  model  is  defined  as  the  sum  of  the  n 
deviance  residuals  and  serves  as  measure  of  overall  goodness-of-fit. 
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The  influence  measure  d*{u})  can  be  related  to  predictive  inference  in  the 
following  way.  First,  define  the  following  modified  predictive  likelihood  displacement: 

PLD*{u;)  =  2[PL{u;y,^)  -  PL{u;y,0j].  (5.10) 

This  differs  from  PLD{u})  in  that  the  prediction  for  u  is  not  perturbed.   Also, 
assuming  y  and  u  share  the  same  density  and  that  all  elements  of  y  and  u  are 
independent,  we  have 

m  m 

PLD*{u;)  =  2[log/(y|^)  +  ^log/(n,|^)  -  log/(yi;9J  -  ^log/(n,|;9J] 

t=l  i=l 

m 

-  2[log/(y|;a)  -  log/(y|^J]  +  2^(log/(«,|^)  -  log/(«,|^J)     (5.11) 

i=l 

m  '        •,  . 

1=1 
where  d*{u})  is  the  deviance  residual  of  perturbation  for  the  i*''  prediction.  Note  that 
the  prediction  for  u  is  the  same  in  both  terms  of  rf*(w).  Technically  then,  d*(uj)  is  a 
measure  of  the  pertubation's  effects  on  the  estimated  conditional  mean  of  u  rather 
than  a  change  in  the  prediction  for  u. 

We  note  that  under  the  same  assumptions  given  above,  the  predictive 
likelihood  displacement  (5.7)  can  be  written  as 

m 

PLD{u})  =  LD[(jj)  +  2  ^  di[u}), 

i=l 

where 

di{l^)  =\og  f{Ui\'p)  -log  f{UiS^  J. 

5.4     Estimation  and  Prediction  in  Linear  Mixed  Models 
In  this  section,  the  linear  mixed  model  is  presented  with  point  estimates  and 
point  predictions  motivated  in  three  ways.  Then,  local  influence  techniques  for  the 
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linear  mixed  model  are  reviewed,  and  additional  tools  for  assessing  predictions  are 
presented. 

5.4.1     Linear  Mixed  Models 

The  linear  mixed  model  (LMM)  is  formulated  as  follows: 


y  =  X)9  +  zu  +  €, 


(5.12) 


where 


u 

0 

5 

G    0 

e 

\ 

0 

0     R 

y  is  an  n  X  1  vector  of  observations,  X  is  a  known  n  x  pi  matrix,  Z  is  a  known 
n  X  p2  matrix,  ;9  is  a  pi  x  1  vector  of  unknown  fixed  effects,  u  is  a  p2  x  1  vector  of 
unknown  random  effects  and  e  is  a  n  x  1  vector  of  unknown  residual  errors.  The 
matrices  G  and  R  are  of  order  p2  and  n  respectively. 

Littell  et  al.  (1996)  provide  a  host  of  applications  for  this  rich  class  of  models. 
When  there  are  no  random  effects  and  R  =  a"^!,  this  model  simplifies  to  the  fixed 
effects  regression  model  (1.10).  The  inclusion  of  random  effects  and  the  general 
residual  variance  matrix  R  permits  the  modeling  of  correlated  data.  This  is  evident 
by  looking  at  the  marginal  distribution  of  the  data  under  model  (5.12): 


%    .     «>     *r>-  ' 


■I    'r** 

■J-  *- 


Y  ^N{X^,ZGZ'  +  R). 


(5.13) 


The  unknowns  in  (5.12)  are  the  fixed  effects  0,  the  random  effects  u,  and  the 
elements  of  the  covariance  matrices  G  and  R.  In  this  chapter,  we  will  concentrate 
on  assessing  the  effects  of  perturbations  on  inferences  for  0  and  u.  Point  estimates 
of  0  and  point  predictions  of  u  can  be  motivated  by  three  methods:  first  principles, 
mixed  model  equations,  and  profile  predictive  likelihood.  See  Searle  et  al.  (1992)  for 
information  on  estimating  G  and  R  when  these  matrices  are  unknown. 
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5.4.2  Inference  by  First  Principles 

The  best  linear  unbiased  estimates  (BLUEs)  of  0  are  the  generalized  least 
squares  estimates: 

V ,:    p  =  {X'V-'X)-'X'V-'y,  (5.14) 

where  V  =  ZGZ'  +  R.   Here,  "best"  refers  to  minimum  mean-squared  error  of 
estimation.  The  estimates  are  equivalent  to  the  MLEs  derived  from  the  marginal 
distribution  of  V . 

The  best  linear  unbiased  predictors  (BLUPs)  for  u  are  given  by 

u  =  GZ'V-^c,  (5.15) 

where  c  =  y  —  X/3  is  the  vector  of  generalized  least  squares  or  composite  residuals, 
the  latter  term  being  coined  by  Hilden-Minton  (1995).   Here,  "best"  refers  to 
minimum  MSEP. 

Note  that  the  BLUEs  and  BLUPs  do  not  require  normality-  only  the  model 
and  covariance  structure  specified  in  (5.12).  Detailed  derivations  can  be  found  in 
Searle  (1971)  and  Searle  et  al.  (1992). 

5.4.3  Inference  by  the  Mixed  Model  Equations 

Henderson  et  al.  (1959)  derived  estimates  for  the  LMM  by  maximizing  the 
joint  log-likelihood  of  u  and  /3: 

(5.16) 
(x--[{y-Xp-  Zu)'  R-^  iy-X0~  Zu)  +  u'G-^u]  . 

This  likelihood  is  also  known  as  an  h-likelihood  (Lee  and  Nelder  1996).  The  solutions 
that  maximimize  (5.16)  are  in  fact  the  same  as  the  BLUEs  and  BLUPs.   The 
resulting  estimating  equations  are  called  the  mixed  model  equations  (MMEs)  and 
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are  usually  written 


V 


X'R^X         X'R^Z 
Z'R^X     Z'R-^Z  +  G-^ 

The  corresponding  set  of  estimating  functions  are 


X'R-'y 
Z'R-'y 


(5.17) 


h{u,l3;y)  = 


X'R-\y-X0-Zu) 
Z'R-\y  -  X/3  -  Zu)  -  G-^u 


(5.18) 


5.4.4    Inference  by  Profile  Predictive  Likelihood 

The  BLUEs  and  BLUPs  can  also  be  derived  by  the  profile  predictive 
likelihood.  Recall  that  inference  using  the  profile  predictive  likelihood  works  as 
follows: 

1.  Maximize  the  joint  likelihood  of  u  and  ^  with  respect  to  0.  Denote 
-        the  solution  as /3(j/,  u). 

2.  Substitute  ;9(2/,w)  into  the  joint  likelihood.  The  resulting  expression, 
L*{u,^{y,u);y),  is  the  profile  predictive  likelihood  of  «. 

3.  Obtain  a  prediction  for  u  by  either  of  the  following  methods: 

a.     Obtain  the  maximum  likelihood  predictor  (MLP)  by  max- 
imimizing  L*{u,  ^{y,  w);  y)  with  respect  to  u. 
h.     Obtain  the  conditional/predictive  mean  of  u  given  (y',  ^  (y,  u)) 
horn  fi{u\y,0{y,u))ccL*{u,0{y,u);y). 

4.  Obtain  point  estimates  of  /3  by  substituting  the  predictor  of  u  from 
step  three  into  ${y,  u). 

By  definition,  the  point  inferences  from  steps  3a  and  4  maximize  the  joint 
likelihood  of  u  and  (3.  This  implies  that  using  the  MLP  from  the  profile  predictive 
likelihood  leads  to  the  BLUEs  and  BLUPs.   However,  it  is  not  as  clear  that  the 
solutions  from  step  3b  and  4  lead  to  the  same  solutions,  and  so  the  derivation 
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follows.  The  first  step  is  to  maximize  the  joint  likelihood  of  u  and  0  with  respect 
to  /3  for  given  u.  This  leads  to  the  first  set  of  mixed  model  equations  (5.17),  which 
have  solution  ^{y,u)  =  {X' R~^ X)~^ X' R'^ {y  -  Zu).  The  next  step  is  to  show 
that  predicting  u  with  its  conditional  mean  given  the  data  and  the  estimate  /3(j/,  u) 
leads  to  the  second  set  of  mixed  model  equations.  Starting  with  the  profile  predictive 
likelihood,  we  have 

L*{u,0{y,u);y) 

ocexp(-^{y-  X^{y,  u)  -  Zu^ '  R'  (y  -  X^{y,  u)  -  Zu)  -  lu'G-'u) 

^exp(-^y'R-'(^y-X0{y,u)-Zu) 

+  ^^\y,u)X'R-'(y-X0{y,u)-Zuj 

+  ^u'Z'R-'  {y  -  X^{y,  u)  -  Zu)  -  ^WG-'u] 

=  exp  (^  -  ^y'R-'  (y  -  X0{y,  u)  -  Zu) 

+  ^u'Z'R-'  (j/  -  X^iy,  u)  -  Zu)  -  lu'G-'u) , 

(5.19) 
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''o-l/ 


since  X'R    (y  —  X^{y,  u)  —  Zu)  =  0.  Examining  the  exponent,  we  find 


-\(y'R-\y  -  XP{y, u)  -  Zu)  -  u'Z'R-\y  -  X^{y, u)  -  Zu)  + 


u'G-^uj 


-Uy'R-'y  -  y'R~'X^{y,  u)  -  2u'Z'R-'y  +  u'Z'R-'XP{y,  u) 
+  u'Z'R-^Zu  +  u'G'^u 

'  .' D— 1 'v/' v' D— 1  ■v^-l  "V' E>— 1( 


oc  --  (  -  y'R-'X{X'R-^X)-^X'R-\y  -  Zu) 

-2u'Z'R-^y  +  u'Z'R-^X{X'R-^X)-^X'R-\y-Zu) 
+  u'Z'R-^Zu  +  u'G~^u 

(x-U-  2u'{Z'R-^  -  Z'R-\X{X'R-^X)-^X'R-^)y 

+  u'{-Z'R-^X{X'R-^X)-^X'R-^Z  +  Z'R^Z  +  G-^)u 

=  -Uu'{Z'{I  -  R-^X{X'R-'X)-^X')R-^Z  +  G-^)u 
-  2u'{Z'{I  -  R-^X{X'R-^X)-^X'))R-^y 

=  -i  (u'{Z'P'R-^Z  +  G-^)u  -  2u'Z'P'R-'y\ , 

(5.20) 
where  P'  =  I  -  R-^X{X'R-^X)-^X'.  It  follows  that 

f{u\y, ${y,  u))  oc  exp  (^  -  ^{u' {Z' P R-^ Z  +  G~'')u  -  2u'Z'P'R-'y)\     (5.21) 

This  is  proportional  to  the  kernel  of  a  normal  distribution  with  a  mean  of 

(ZP'R-'Z  +  G  ')   'Z'P'R-'y,  (5.22) 

and  this  predictive  mean  is  the  point  prediction  for  it. 
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Meanwhile,  substituting  the  solution  from  the  first  set  of  MMEs  into  the 
second  set  yields 

Z'R-^X{X'R-^X)-^X'R-^{y  -  Zu)  +  {Z'R-^Z  +  G-'^)u  =  Z'R-^y.     (5.23) 

Solving  for  u  provides  the  BLUP  of  the  random  effects, 

u  =  {Z'P'R-^Z  +  G-^)-^Z'P'R-^y, 

and  this  is  identical  to  the  predictive  mean  (5.22).  Hence,  predicting  u  with  the 
predictive  mean  from  the  profile  predictive  likelihood  yields  point  inferences  for  « 
and  /3  that  are  identical  to  those  from  the  mixed  model  equations. 

5.5     Local  Influence  Methods  for  Linear  Mixed  Models 
5.5.1     Assessment  in  the  Marginal  Model  .  ,_ 

Local  influence  techniques  for  linear  mixed  models  have  been  developed  using 
the  marginal  model  (5.13).  Beckman  et  al.  (1987)  considered  a  special  case  of  the 
linear  mixed  model  known  as  the  variance  components  model.  This  model  is  used 
for  data  with  a  sources  of  grouping  or  clustering,  where  the  i*''  source  has  gi  levels. 
For  Zi  being  of  size  n  x  g^  and  Wj  being  of  size  5',  x  1,  the  model  is  given  by 

y  =  X0  +  ZiUi  +  Z2U2  +  ...ZaUa  +  e.  (5.24) 

This  structure  implies  Z  is  a  block-diagonal  matrix  with  i"*  block  Zi,  and  u  has  a 

total  of  p2  =  Yli=i  9i  elements.  In  addition  to  the  general  LMM  assumptions,  the  Wj 

vectors  are  mutually  independent,  with  V(ui)  =  aflg.  and  V(c)  =  cr^J„.  The  model 

implies  that  the  marginal  variance  of  Y  can  be  written  as 

- '  -  ■  "• 

V  =  ZGZ'  +  R=J^afZiZ[  +  a^I. 
1=1 
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Beckman  et  al.  (1987)  assumed  the  fixed  effects  as  well  as  the  variance  parameters, 
{a^,al, . . .  ,(Tq),  are  jointly  estimated  by  ML.  Two  perturbation  schemes  were 
introduced.  The  first  scheme  perturbs  the  z*'*  residual  variance  to  Wjcr^,  while  the 
second  scheme  introduces  a  perturbation  to  the  variance  of  each  random  effect:  the 
variance  of  the  j*''  element  of  Mj  becomes  uijaf.  While  both  of  these  perturbations 
technically  perturb  the  model  assumptions,  the  major  goal  is  to  assess  the  influence 
of  individual  observations  and  clusters,  respectively. 

Lesaffre  and  Verbeke  (1998)  emphasized  applications  to  longitudinal  data. 
They  considered  the  following  linear  mixed  model: 

Vi  =  Xi/3  +  ZiUi  +  Ei  (5.25) 

where 

i  =  l,...g, 

yi  is  vector  of  nj  observations, 

Xi  is  an  nj  X  p  matrix  , 

Zi  is  a,n  rii  X  t  matrix, 

Ui  is  a  vector  of  t  random  effects  and 

Cj  is  vector  of  rii  errors. 

It  is  assumed  that  V{Ui)  =  Gj,  V{€i)  =  /„,,  and  that  y^  and  y^,  are  independent  for 
i  ^  i' .  The  model  permits  the  inclusion  of  random  slopes,  and  the  matrix  Gj  can  be 
as  general  as  needed.  Also  note  that  the  model  has  a  total  of  p2  =  gt  random  effects. 

In  the  context  considered  by  Lesaffre  and  Verbeke  (1998),  the  subscript  i 
corresponds  to  the  i*^  subject,  from  whom  repeated  measurements  are  taken  over 
time.  The  model  implies  that  the  log-likelihood  is  the  sum  of  contributions  from  each 
of  the  subjects,  and  by  allowing  each  of  the  m  weights  to  vary  from  1,  a  subject-level 


148 

case  weight  was  introduced  as  a  perturbation.  Because  the  perturbation  does  not 
correspond  to  a  new  density  for  the  data,  it  would  be  considered  a  perturbation  of 
the  estimation  technique.  Diagnostics  were  primarily  based  on  basic  perturbations 
(i.e.,  one-at-a-time  perturbations)  of  subject  weights.  Lesaffre  and  Verbeke  (1998) 
showed  that  the  basic  curvatures  could  be  written  as  the  sum  of  several  terms,  and 
the  easily-interpretable  terms  were  used  as  a  means  to  compare  the  relative  influence 
of  subjects. 


5.5.2     Assessment  of  Predictions 

The  influence  of  perturbations  on  both  the  BLUEs  and  BLUPs  can  be 
assessed  by  applying  the  techniques  developed  in  Chapter  4  to  the  predictive 
HkeUhood  displacement  (5.7).  The  estimating  functions,  h{0,y),  are  those  used 
in  the  mixed  model  equations  (5.18),  and  if  the  perturbed  model  is  also  a  linear 
mixed  model,  the  perturbed  estimating  functions,  hi^{0;y),  are  also  of  this  form. 
Local  assessment  of  the  predictive  likelihood  displacement  requires  the  following 
quantities: 


e  = 


13 
u 


i{e^)  =  e^ 


m  =  PLD{u,)  =  2[PL{u;  y,  ^)  -  PL{u^;  y,  ^ J]. 

The  next  step  is  to  determine  whether  first-  or  second-order  local  assessment  is 
appropriate.  Since  the  influence  measure  does  not  explicitly  depend  on  a;,  s^^  =  0. 
Also,  by  derivation  of  the  mixed  model  equations. 


8^ 


dPLD{u}) 


djiej 


d 

30^ 


{-2PL{u^;y,0J) 


=  -2 


dL{e;y) 

de 


=  0. 
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The  implication  is  that  the  gradient  vector  vanishes.  Proceeding  with  a  second-order 
assessment,  we  note 


S  = 


dwdui'        |0         dwd-yiey     1° 
d'y{0)dw'     l0       d'y{0)d'y(ey  I" 


0       0 
0    5 


77 


iif 


57(^<.) 


a©,. 


Using  (4.32),  the  acceleration  matrix  is  then 


/,    JiiT' 


■*9 


(5.26) 


=  -2A'h" 


.52L(0;j,) 


8080' 


h    A 


-2A'h    A. 


where  h  =  8h{u,/3;y)/d0'\o.  The  matrix  h  is  derived  as  follows.  Taking  derivatives 
of  the  estimating  functions  with  respect  to  0  we  find: 


8h{u,/3;y) 
80' 


d_ 
80' 


X'R-\y-X/3-Zu) 
Z'R-\y  -X^-  Zu)  -  G'u 

X'R^X    X'R^Z 
Z'R-^X     Z'R^Z  +  G-^ 


(5.27) 


The  expression  is  not  a  function  0,  so  (5.27)  is  in  fact  h.   It  can  be  shown  by- 
inverting  this  matrix  that  the  variance  matrix  of  the  BLUEs  and  BLUPs  is  (Searle 


et  al.  1992) 


where 


-h 
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(5.28) 


-h^^  =  {x'v-^xy^ 

-h?^  =  -GZ'V'Xh^^ 

-/i22  =  {Z'R-^Z  +  G-^)-i  -  h'^^X'V-'^ZG. 

Once  a  perturbation  is  chosen  and  the  perturbed  estimating  functions  are 
constructed,  computation  of  the  acceleration  matrix  (5.26)  is  straightforward. 

5.5.3    Assessment  for  Subsets  of  Point  Inferences 

Influence  measures  for  subsets  of  the  perturbed  point  inferences  can  also  be 
developed.  If  only  the  fixed  effects  are  of  interest,  then  assessment  can  be  done 
using  the  marginal  model  (Beckman  et  al.  1987;  Lesaffre  and  Verbeke  1998).  More 
generally,  a  profile  predictive  likelihood  displacement  can  be  defined  to  assess  the 
change  in  a  subset  of  fixed  and  random  effect  point  inferences,  6^  =  {^i,u[): 

PLD^''\^)  =  2[PL{u-y,P)  -  PL{u,^,gu{K)-yA.,9p{0i.))l  (5.29) 

where  ^j,^  =  Cfii^,u\J  comes  from  /9^  =  {^[^,^'2^'  and  u^  =  (wio/'^L)'  and 
i9'j3{^ii^),9'u{0icj))  maximimizes  PL{ui^,U2;y,^i^,02)  with  respect  to  {/3'2,u'2).  For 
linear  mixed  models,  the  acceleration  matrix  follows  from  the  arguments  for  deriving 
the  acceleration  matrix  of  LD^^^^{u})  in  Section  4.6.4,  and  is  given  by 


M  =  -2A'{h     - 


0      0 

-1 


0    h 


22 


)A, 
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where  /122  comes  from  the  partition 


h 


hii    hi2 
h2\    /1-22 


When  influence  on  a  linear  combination,  x'^  +  z'ii,  is  of  interest,  the  results  of 
Section  2.7.1  can  be  used.  Letting  x*'  =  {x',z'),  it  follows  from  the  argument  of 
Theorem  2.1  that  .        . 


«f 


Cm^  =  ^||A'/l"'||2  t;:nax  OC  A'h'x*,  (5.30) 


where  ki  =  h       x*. 


5.6     One-way  Random  Effects  Model 

The  one-way  random  effects  model  can  be  used  to  provide  insight  into  the 
diagnostic  value  of  locally  assessing  the  linear  mixed  model.   Some  preliminary 
results  for  this  special  case  of  the  LMM  are  presented  here,  so  as  not  to  clutter  later 
explanations. 

The  (unbalanced)  one-way  random  effects  model  is 

Vij  -  fi  +  Ui  +  eij        i^l,...g    j  =  1,... m,  (5.31) 

with  V(itj)  =  a^  and  V(eij)  =  a^.  It  is  a  special  case  of  the  variance  components 
model  (5.24)  with  a  =  1,  X  =  1„,  and  /3  =  jx,  and  is  also  a  special  case  of  the 
two-level  model  given  in  (5.25).  If  all  n,-  are  equal  to  n/g,  the  design  is  said  to  be 
balanced.  It  can  be  shown  that 

^=y-^  1      -  a^d      u,=  -—^c,  (5.32) 

where  y,  and  Cj  are  the  average  response  and  average  composite  residual,  respectively, 
for  the  z*'*  cluster  (Searle  et  al.  1992). 


The  building  blocks  for  deriving  h  are  as  follows: 


X   =   lr 


z  = 


•■m 


R  =  a^In  G  =  allg 

X'X  =  n  Z'Z  =  Dim) 

X'Z  =  n' =  {ni,n2,...ng).      ,  ' 


ln„ 
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It  follows  that 


-h 


X'R^X         X'R^Z 
Z'R-^X     Z'R^Z  +  G^ 


iTl 


rr*- 


^n    ^Dini)  +  ^Ig 


n  n 

n    D{ 


a^+n,aj 


(5.33) 


The  following  results  for  a  partitioned  matrix  with  a  scaler  upper-left  sub-matrix 
can  be  used  to  invert  the  matrix  (Searle  1982): 


A    B 
C    D 


1  -BD-^ 


+ 


0       0 
0    D-^ 


(5.34) 
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where  c  =  A-  BD  ^C.  Applying  this  result,  we  find 


-h 


a 
~k 


1 


1  1 


'^i+ 


c   '   ni 


<^c^  + 


^i  +  z 


atM{n„nj)  +  kD{^f^^) 


(5.35) 


where 


M{ni,nj) 


1  1 


and 


y  9  y  9  ■ 

■^a^  +  ^Vn,       -^a^  +  aVni 

2 

When  the  design  is  balanced,  A;  =  ^-2/  +g2/n)  and  the  following  simplifications  can  be 
made: 


-/*        =  (^c/^  +  <^V^) 
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al/g  +  ayn 


-^1 
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-'r'r/9 

—V 

In^g 

11          \  2 

1    /^       ''c/n        T 
9  "^  '^allg+cj^/n^g 

allg+ayn)     ^9X 

7"              -^K 

'                              g    9 

"l 

'^^9^9  +    n  -'^S 

o-?/S+o'^/n 

.<-      * 


-r:  "^ 


Of  particular  interest  are  the  first  and  second  columns  of  — /i 
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-1 
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(t| +0-2/712 


c^+a'^/ng 
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(5.36) 


(5.37) 


5.7    Residual  Variance  Perturbation  Scheme 
In  this  section,  a  residual  variance  perturbation  is  proposed.  The  perturbation 
is  applicable  to  the  general  form  of  the  linear  mixed  model.  Simplifications  when 
R  =  a^I  are  presented,  and  some  closed-form  expressions  for  the  one-way  model  are 
given. 


5.7.1     Form  of  the  Perturbation 

Let  aij  denote  the  if^  element  of  the  matrix  R.  A  set  of  n  perturbations  is 
introduced  into  the  model  such  that  the  perturbed  residual  covariance  matrix  is 


Rw  — 


ai2 
V'a)iW2 

X/W1W3 

y/(jJltjJ2 

"it 

W2 

013 

(Tin 

0-2n 

03„ 

■s/uliiiln 
0^3n 


\/OJlCO„  y/cJ2Uln  s/UJ3U]n 


The  intent  of  this  scheme  is  to  weight  the  contribution  of  individual  observations 
to  inference  in  situations  when  they  do  not  necessarilly  enter  the  likelihood  in  an 
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additive  fashion.  The  perturbation  changes  the  variances  but  leaves  the  correlations 
among  the  error  terms  unchanged.  The  resulting  local  influence  diagnostics  can 
detect  influential  observations  without  upsetting  the  relationships  assumed  to  exist 
among  the  residual  errors. 

It  simplifies  to  the  error  variance  perturbation  given  for  regression  when 
R  =  a^I. 

5.7.2     Computational  Building  Blocks        - 

To  derive  A  =  dK{0]y)/duj'\o,  define  e  =  y-X^-Zu,  and  X*  =  [X  Z]. 
We  begin  with  the  perturbed  joint  likelihood.  Ignoring  terms  that  do  not  depend  on 
both  0  and  u?  we  have 

L^{u,0;y)  =  log(/(y  |  0,u)g{u  |  0)) 

^-l[{y-x*erRzHy-x*e)] 

=  -^  [y'R-Jy  -  20'X*'R-'y  +  e'X*'Rz'X*0]  . 

Taking  derivatives  with  respect  to  0,  we  find 

dL^{u,0;y) 


Kie;y)=        ,, 

=  ~  [-2X*'R-'y  +  2X*'RZ'X*0] 
=  X*'Rz'{y-X*0). 
Second  partial  derivatives  are  given  by 
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Evaluating  this  at  the  original  model  we  obtain  the  i*'*  column  of  A: 


*/  D-l 


Ai  =  X*'R 


0         0 


0 

2<7li      2*^21 
0  0 


0         0 


0  \aii  0         ...       0 


)^(i+l)i 


0  2^ni 


0  ...  0 


R-^e. 


(5.38) 


When  H  =  a"^!,  R^  =  a'^D{l/u}),  and  the  expression  above  simplifies,  allowing  us 
to  write 

A  =  ^X*'D{e) 


and 


M 


-2A'L  ^A  =  ^D(e)  [X     Z] 


T  -1   r 


X'X    X'Z 

Z'X     Z'Z  +  a^G-^ 


X' 

z' 


D{e). 


5.7.3    One-way  Random  Effects  Model 

Deriving  results  for  the  one-way  random  effects  model  in  more  detail  should 
provide  information  about  the  pertubation's  diagnostic  value.    First,  consider 
influence  on  the  intercept  parameter  estimate,  /i.  This  can  be  accomplished  by 
using  either  m  =  fi^  or  m  —  PLD^^\u})  as  the  influence  function.  The  direction  of 


maximum  local  influence  for  both  measures  is  given  by 
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Vmax  cx:  A'/l     di 


ex  D{e)  [X     Z] 


(T^+a^/m 


t2-I-/t2/ 


<T|+<T2/ng 


oc£>(e) 


■•ni 


(x£>(e) 


(•'•        (72+o-'2/ni)-'-"i 
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ei 

CT2+0-2/ni 

aVng 

Cfl  _ 

a2+^2/"3 

cr| +0-2/711 


(5.39) 


where  Cj  denotes  the  residuals  for  cluster  i.   Thus,  the  direction  of  largest  local 
influence  is  proportional  to  the  residuals,  but  with  each  cluster  scaled  according  to 
its  size.  The  coefficients  upweight  observations  in  smaller  clusters  and  downweight 
observations  in  larger  clusters.  The  reason  for  this  is  that  ft  is  not  a  simple  average 
of  the  observations:  the  numerator  of  /t  in  (5.32)  can  be  written  as 

9      ni 


EE- 


Vij 


rual  +  <t2 

1=1  ]=i        ^ 


This  indicates  that  observations  from  smaller  clusters  receive  more  weight.   Note 
that  if  either  the  design  is  balanced  or  al  —  0,  then  Umax  oc  e. 
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We  now  consider  influence  on  the  predicted  random  effects.  Without  loss  of 
generality,  let  us  examine  influence  on  Ui .  The  direction  of  largest  local  influence  is 
given  by 


Vr, 


DC  A'h    di 


oc  D{e)  [X     Z\ 
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(T^+iT^/ni         ni 


al+a"^  In2 


al+a'^lng 


Die) 
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''l+aVns 


(5.40) 
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Although  the  individual  cluster  sizes  play  into  the  weights,  it  is  clear  from  inspection 
that  the  residuals  from  the  first  cluster  have  weights  roughly  g  —  1  times  greater 
than  the  others.  In  the  balanced  case,  all  rii  —  n/g,  and  this  implies 

y^     aVm      _  {g-  l)crV" 


^  al  +  crVn,-       a'^/n  +  al/g ' 


It  follows  that 


^max  Ot 


1 
■9-1 


62 


s-i^s  J 


Thus,  in  the  balanced  case,  the  residuals  of  the  first  cluster  have  weights  that  are 
g  —\  times  larger  than  the  others.  In  fact,  the  BLUP  of  U\  is  given  by 


'ylh 


o'^jn  +  allg 


5-1 


1    ^ 

)z/i  -  -  Xl  y^ 


1=2 


Not  surprisingly,  maximum  influence  on  Ui  is  achieved  primarily  by  perturbing  the 
variance  of  observations  in  cluster  i. 


5.8     Cluster  Variance  Perturbation  Schemes 
In  this  section,  two  cluster-based  perturbation  schemes  are  considered  for 
assessing  how  clusters  of  observations  influence  estimation  and  prediction. 

5.8.1     A  Cluster-based  Case- weight  Perturbation 

For  simplicity,  we  consider  the  model  given  in  Lesaff're  and  Verbeke  (1998), 
(5.25),  but  with  the  diagonality-constraint  for  Ri  dropped.  To  perturb  the  model, 
we  let  Gi  become  ^Gj  and  Ri  become  ^ilj  for  i  =  \,. .  .g.  The  perturbation  of 
both  matrices  is  important:  it  preserves  the  correlations  among  the  random  effects 
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as  well  the  correlations  among  observations  in  the  same  cluster.  Hence,  the  only 
effect  of  the  correlation  is  to  change  the  relative  weighting  of  the  clusters,  allowing 
us  to  assess  the  influence  of  clusters  of  observations. 


5.8.2     Computational  Building  Blocks 

To  derive  A  we  begin  with  the  perturbed  joint  likelihood: 

K{u,/3;y)  -  log(/(2/  |  /3,u)g{u  |  13)) 

a  -  ^  [(y  -  X*9)'  R-Jiy-  X*9)  +  u'Cju] 

=  -\  [y'K'y  -  2e'x*'Rz'y  +  9'x*'Rz'x*e  +  u'gz'u] 

Partial  derivatives  with  respect  to  the  fixed  effects  /3  are: 

^^■^^'^^  =  -\  [-2X'R'^'y  +  2X'R-JXP  +  2X'Rz'Zu\ 

=  X'Rz\y-Xfi-Zu) 

=  Y,u,X[RT\y,-  Xip  -  ZiU,). 
1=1 


Partial  derivatives  with  respect  to  w  are 


du 


=  Z'R-\y,  -  Xi/3  -  ZiUi)  -  G-'u 

9 

=  J^'^iiZ'.RT^y,  -  Xi^  -  ZiUi)  -  Q.'ui). 


«=i 


The  perturbed  estimating  functions  are  then 


K{e\y)  = 


Y:U^iX\R.\y,-  X,0  -  Z,Ui) 
ELi  uJi{Z\R.\y,  -  XS  -  ZiUi)  -  Cr'ui) 


(5.41) 
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It  follows  that  the  z*''  column  of  A  is 


dK{e-y) 


dcoi 


X\RT\y^-X,l3-ZiU,) 


X\R^'ei 


(5.42) 


Z\Rr'ei  -  G-'ui 

This  expression  can  be  further  simplified.  Because  the  matrices  Z,  R,  and  G  are  all 
block  diagonal,  the  BLUPs  for  the  i*''  cluster  can  be  written  as 


Ui  =  {Z[Rr^Zi  +  Gr')-'Z[R7'c, 


(5.43) 


where  Cj  is  the  vector  of  composite  residuals  for  the  i*'*  cluster.  Substituting  this 
into  the  bottom  portion  of  (5.42),  we  find 

Z'iR;^ei~Gr^Ui  =  Z'iR;\ci-ZiUi)-G;^Ui 

=^ZrR7^Ci-iZ[Rr'Zi  +  Gr')u, 

=  Z[RT'ci  -  {Z'.R-r'Zi  +  Gj'){Z\R-'Zi  +  G-'Y' Z[R-^ a 

=  0. 


This  implies  that  only  the  top  pi  rows  of  A  are  non-zero.  The  matrix  can  be  written 
compactly  as 


A  = 


X'R-^D{ei) 
0„ 


'flXff 


(5.44) 


where  D(ei)  is  an  n  x  m  block-diagonal  matrix  whose  i*'*  block  is  ej. 
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The  formula  has  several  implications.  First,  the  acceleration  matrix  for  the 
influence  measure  PLD{ui)  can  be  written  as 


M  =  -2A'/i    A 

-  -2     D{ei)'R-^X    0 


h'' 

/."" 

h'' 

h" 

X'R-^D{ei) 
0 


(5.45) 


^-2D{ei)'R-^X{h}^)X'R-^D{ei) 

=  2D{ei)'R-^X{X'V-^X)-'X'R-^D{ei). 

This  matrix  has  at  most  pi  distinct  non-zero  eigenvalues  regardless  of  the  number  of 
random  effects.  In  addition,  using  the  results  given  in  Section  5.5.3,  the  acceleration 
matrix  for  influence  measure  PLD^^^{lo)  is  equivalent  to  (5.45).    Finally,  the 
acceleration  matrix  for  PLD^^^u:)  is 
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Diei)'R-^X    0 
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Kl  0 
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X'R-^D{ei) 

•-                             -J 
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(5.46) 


=  2D{ei)'R-^X{{X'V-^X)-^  -  {X'R-^X)-^)X'R-^D{ei). 


/f  ? 
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5.8.3     One-way  Random  Effects  Model 

Using  (5.30),  Vmax  for  directed  influence  on  a  linear  combination  of  the 
intercept  and  random  effect  point  inferences  is  given  by 


Umax  oc  A'^    X*  a 


oc 


h    X* 


,11 


12 


h''      h" 
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h''    h 
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Using  the  relationship  between  Cj  and  Uj  given  in  (5.32), 
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»» 
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cTg  +  a^/ui  „ 


(5.47) 


(5.48) 


This  implies  that  the  direction  of  maximum  curvature  is  proportional  to  m,  the 
vector  of  BLUPs.  Thus,  we  see  that  the  form  of  A  has  an  interesting  consequence 
when  the  intercept  is  the  only  fixed  effect  in  the  model:  no  matter  what  linear 
combination  is  chosen,  the  direction  of  maximum  curvature  is  the  same.   This 
anomaly  does  not  occur  when  ^  has  more  than  element. 
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5.8.4     An  Alternative  Cluster  Variance  Perturbation  Scheme 

In  this  section,  the  variance  perturbation  scheme  proposed  by  Beckman  et  al. 
(1987)  is  addressed.   For  the  one-way  random  effects  model,  the  perturbation  is 
V(ui)  =  Wjcr^.  As  each  uji  approaches  infinity,  V(«i)  — >  oo  and  V{yij)  — >  oo.  Clusters 
perturbed  to  have  higher  random  effects  variance  are  deemed  to  provide  less  reliable 
information  for  inference.  Also,  the  correlation  between  observations  in  the  same 
cluster  approaches  1  as  Wi  — >  oo: 


Thus,  observations  from  clusters  that  are  perturbed  to  have  higher  random  effects 
variance  are  deemed  more  alike,  while  observations  from  clusters  perturbed  to  have 
smaller  random  effects  variance  are  deemed  less  alike.  This  is  in  contrast  to  the 
perturbation  given  previously,  which  preserves  the  intra-cluster  correlations.  Also, 
the  alternate  perturbation  does  not  reweight  the  clusters'  entire  contributions  to  the 
likelihood.  Since  it  does  not  correspond  to  a  cluster  weight,  it  provides  diagnostic 
information  about  the  assumption  of  equal  cluster  variance  rather  than  the  influence 
of  clusters. 

•     By  using  Y{ui)  =  ^a^  rather  than  Y{ui)  —  ujiol,  we  can  obtain  Umax  for 
this  perturbation  from  results  for  the  first  cluster  perturbation.  Using  the  partial 
derivatives  from  Section  5.8.3,  we  may  derive 


A  = 


0 

-G-^D{u) 


165 


The  acceleration  matrix  is  then 


M  =  -2A'h    A 


=  -2 


r 

[/."    h''] 
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0 

-£)(m)'G-^ 

A"  A" 

-G-^£>(m) 

22 


=  -2D{u)'G-'h    G-^D{u) 

=  2D(u)'G-^  ({Z'R-^Z  +  Q-Y' 


(5.49) 


+  GZ'V-^X{X'V-^X)-^XV-^ZG]  G'^D{u 


This  matrix  has  at  most  p2  distinct  non-zero  eigenvalues  regardless  of  the  number  of 
fixed  effects.  In  addition,  using  the  results  given  in  Section  5.5.3,  the  acceleration 
matrix  for  influence  measure  PLZ)["](a;)  is  equivalent  to  (5.49).    Finally,  the 
acceleration  matrix  for  PLD^^^  (a;)  is 
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hn     0 
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) 
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G-^D{u) 


(5.50) 


=  2D{u)'Z'V-^X{X'V-^X)-^XV-^ZD{u). 

As  special  case,  consider  the  one  way  random  eff'ects  model.  Using  Theorem 
2.1,  the  direction  of  maximum  curvature  for  fi  is 

Umax  OC  A'/l      dy 
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Here,  the  elements  corresponding  to  larger  clusters  receive  more  weight.  This  is  in 
contrast  to  the  first  cluster  perturbation  scheme,  for  which  Umax  oc  w. 
The  direction  of  maximum  curvature  for  Ui  is  given  by 


-1 


Umax  CX:  A'/l      d2 
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0    -D{u) 


^2 


-1 


+ 


a^+a^/Ug 


(5.52) 
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<72+a2/n2"2 

where  k  was  defined  after  (5.35).  The  coeflScient  for  the  first  element  is  approximately 
1  +  g/rii  times  larger  than  the  others,  indicating  that  influence  on  the  i*''  BLUP  is 
accomplished  largely  by  perturbing  the  variance  of  Ui.  The  direction  of  maximum 
curvature  for  jj,  +  Ui  is  even  more  dominated  by  the  first  element: 
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2  a^+a'^/m'^^ 
1 
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Here,  the  first  element  is  approximately  g  —  \  times  larger  than  the  other  elements. 
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5.9    Application  to  Corn  Crop  Data 
In  this  section,  the  residual  variance  perturbation  and  cluster  variance 
perturbation  schemes  are  applied  to  an  agricultural  data  set.  After  briefly  reviewing 
the  data  set  and  goals  of  the  analysis,  local  influence  techniques  are  applied. 
Graphical  results  indicate  that  the  data  set  has  an  influential  observation. 

5.9.1     Data 

Battese  et  al.  (1988)  analyzed  reported  areas  of  corn  crops  from  12  Iowa 
counties  using  a  variance  components  model.   Keeping  the  same  notation  as  in 
Battese  et  al.  (1988),  the  model  is 


Vij 
where 


—  PO  +  Ui  +  Pcorn^Uj  +  Psoy^2ij  +  ^ijj 


xjij  =  reported  hectares  of  corn  in  segment  j  from  county  i 
xi^.  =  no.  of  pixels  of  corn  in  segment  j  from  county  i  as  per  satellite  imagery 
X2ij  =  no.  of  pixels  of  soy  in  segment  j  from  county  i  as  per  satellite  imagery 
txi~  i.i.d.  N{0,al) 

Cij  ~  i.i.d.  N{0,a^)  ■. 

1  =  1,...  12 
j  =  l,...ni 
{nj  =  (1,1, 1,2, 3, 3, 3, 3, 4, 5, 5, 6) 
^ni=37. 

Of  particular  note  in  the  data  set  is  observation  33.  The  reported  hectares 
for  this  case  are  exactly  the  same  as  those  of  the  previous  observation  despite 
having  vastly  diflferent  covariate  values.  Battese  et  al.  (1988)  felt  that  the  value  was 
misrecorded,  and  did  not  include  it  in  their  analysis. 


Appendix  B  contains  model-fitting  information  from  an  analysis  in  which  the 
the  observation  is  included.  Note  that  the  regressors  are  also  centered  and  scaled 
for  this  analysis.  Additionally,  the  variance  components  have  been  estimated  by 
restricted  maximum  likelihood  (REML).  By  not  perturbing  the  REML  estimating 
functions,  the  local  influence  techniques  described  in  this  chapter  can  be  used  to 
assess  the  effects  of  perturbations  on  ^  and  u  while  effectively  treating  the  variance 
components  as  known. 

The  goal  of  the  analysis  is  to  first  establish  a  linear  relationship  between 
the  satellite  data,  which  is  readily  available,  and  the  actual  reported  data,  which 
is  accurate  but  not  available  for  all  segments  in  all  counties.  The  resulting  model 
can  then  be  used  to  make  predictions  for  segments  in  which  reported  hectares  are 
unavailable. 

Interest  centers  on  predicting  the  population  mean  hectares  of  corn  per 
segment.  Because  the  satellite  data  are  available  for  all  segments  in  each  of  the  12 
counties,  the  population  mean  number  of  pixels  for  corn  and  soy  in  county  i  is  known. 
These  means  are  denoted  xu^^^  and  ^2i(p)  and  respectively.    The  corresponding 
population  mean  hectares  of  corn  per  segment  in  county  i  is  then  defined  as 

Point  inferences  for  these  values  can  be  made  using  the  the  corresponding  BLUPs: 

/^i  =  Po  +  ^i  +  Pcorn^lii^p)  +  Psoy^2i(p)  •  . 

5.9.2     Residual  Analysis 

Figure  5.1  contains  diagnostic  plots  of  the  corn  data.  The  first  plot  is  of  the 
37  composite  residuals  c  =  y  -  X^.  The  three  single-observation  counties  appear 
first,  followed  by  the  county  with  two  obervations,  etc.  Cases  from  the  same  county 
(cluster)  are  joined  by  a  line,  a  practice  suggested  by  Hilden-Minton  (1995).  This 
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plot  indicates  that  observation  33  is  the  most  outlying  from  the  overall  estimated 
regression  curve,  X^.  Figure  5.1(b)  shows  the  residuals  e  =  y  -  X$  -  Zu.  This 
shows  that  even  when  taking  into  account  the  county  effects,  observation  33  is 
the  most  outlying.  Figure  5.1(c)  shows  both  sets  of  residuals  together,  with  the 
composite  residuals  represented  by  stars,  and  the  residuals  represented  by  triangles. 
We  note  that  the  largest  difference  among  them  is  for  counties  5  and  11,  where  the 
composite  residuals  are  nearly  all  smaller.  Not  surprisingly,  counties  5  and  11  have 
the  largest  BLUPs  in  Figure  5.1(d).  Also,  because  the  residual  for  case  33  is  large 
and  a  different  sign  than  the  other  residuals  for  county  12,  it  appears  that  Uu  would 
be  considerably  larger  if  not  for  observation  33.  A  local  influence  analysis  can  assess 
this  possibility  more  thoroughly. 

5.9.3     Scheme  1:  Residual  Variance  Perturbation 

Figure  5.2  contains  local  influence  diagnostics  for  perturbing  the  error 
variance  in  the  corn  model.  The  first  plot  shows  the  basic  curvatures,  Cb,  that 
result  from  one-at-a-time  perturbations.   Observation  33  is  considerably  more 
influential  than  the  others  owing  to  its  noticeably  large  curvature.   The  element 
for  case  33  dominates  the  direction  of  largest  curvature  (Figure  5.2(b)).   If  all 
elements  were  equally  contributing,  then  they  would  equal  either  -  =  ^  =   164  or 
—.164.  Therefore,  the  value  for  element  33  is  over  five  times  larger  than  the  value 
of  equally-contributing-elements  (e.c.e.).  Plots  (c),  (d),  and  (e)  are  the  directions  of 
maximum  curvature  for  directed  influence  on  the  intercept,  the  CORN  slope  and  the 
SOY  slope  respectively.  The  plots  indicate  that  case  33  is  a  dominate  contributor 
to  the  SOY  slope  in  particular.  Plot  (f)  displays  the  largest  curvatures  for  directing 
influence  on  the  12  predicted  random  effects.  This  plot  shows  which  BLUPs  are 
most  sensitive  to  perturbation.  Here  we  see  that  the  county  12's  BLUP  is  the  most 
sensitive  to  the  case-weights.  This  is  the  county  containing  the  questionable  data 
point. 
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Figure  5.3  shows  the  directional  vectors  Vmax  for  directed  influence  on  the 
individual  BLUPs.  Typically,  it  would  be  unnecessary  to  examine  all  of  these  plots, 
especially  if  there  are  a  large  number  of  random  effects  in  the  model.  Nonetheless, 
all  plots  have  been  included  here  to  more  fully  demonstrate  their  behavior.  Upon 
inspection,  maximum  perturbation  of  Ui  is  generally  achieved  by  perturbing  cases  in 
cluster  i.  This  is  not  surprising  given  the  closed- form  results  for  the  one-way  random 
effects  model  in  Section  5.8.3.  Only  in  plots  (a)  and  (g)  do  we  see  an  observation 
in  another  county  exerting  influence.  In  both  cases,  observation  33  is  the  influential 
out-of-cluster  data  point. 

Figure  5.4  contains  local  influence  diagnostics  associated  with  point  inferences 
for  linear  combinations  of  fixed  and  random  effects.   Recall  that  in  Chapter  2, 
MaxFITS  was  proposed  as  a  measure  of  how  sensitive  fitted  values  are  to  residual 
variance  perturbation,  while  BasicFITS  was  proposed  as  a  measure  of  how  much  a 
single  observation  determines  its  fitted  value.  These  diagnostics  are  Cmax  and  C(, 
values  for  M      ,  respectively.  Figure  5.4(a)  shows  MaxFITS  (stars)  and  BasicFITS 
(triangles).   The  values  for  MaxFITS  indicate  that  the  prediction  of  cases  33 
and  34  are  noticeably  more  sensitive  to  pertubation  than  the  other  predictions. 
The  sensitivity  of  prediction  33  is  largely  a  function  of  perturbing  this  case,  as 
demonstrated  by  the  fact  that  its  BasicFITS  value  is  nearly  as  large  as  its  MaxFITS 
value.  This  is  confirmed  by  Umax  for  directed  infiuence  on  the  fitted  value  of  case  33 
-  element  33  is  by  far  the  largest  contributor  (plot  (c)).  However,  the  sensitivity  of 
case  34's  prediction  is  not  largely  a  functon  of  perturbing  case  34,  as  demonstrated 
by  a  moderately  sized  BasicFITS  value.  The  source  of  the  influence  is  in  fact  case  33, 
as  is  evident  in  the  plot  of  Umax  for  directed  influence  on  the  fitted  value  of  case  34, 
(plot  (d)).  In  short,  the  fitted  values  for  observations  33  and  34  are  both  sensitive 
because  they  are  greatly  influenced  by  the  model's  assumption  that  observation  33 
is  just  as  reliable  as  the  rest  of  the  data.  ji  j^  _  V     •* 

#  ■■  '  ■  ■  ■. 
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Of  course,  the  main  goal  of  the  analysis  is  to  obtain  predictions  for  the 
mean  hectares  per  segment  in  each  of  the  12  counties.  To  determine  whether  these 
predictions  are  sensitive  to  Scheme  1,  a  variation  of  BasicFITS  and  MaxFITS  is 
used.  First,  the  12  Cmax  values  for  directed  influence  on  county  mean  predictions 
are  obtained.  These  are  used  to  assess  the  relative  sensitivity  of  the  predictions. 
Next,  the  37  basic  curvatures,  Cj,,  are  calculated  for  individual  segments  as  directed 
upon  the  prediction  for  the  corresponding  county.  This  allows  us  to  examine  how 
much  one-at-a-time  perturbations  within  a  cluster  affect  the  prediction  of  that 
cluster's  predicted  mean.   Finally,  the  12  Cmax  values  and  the  37  C(,  values  are 
plotted  together  (Figure  5.4(b)).  Here  the  Cb  values  are  plotted  against  case  indices, 
and  the  Cmax  values  are  plotted  against  the  case  indices  of  observations  in  the 
corresponding  county.  (Plot  (b)  is  essentially  a  BasicFITS/MaxFITS  plot  when  only 
one  prediction  is  of  interest  in  each  cluster.)  The  implications  for  the  corn  analysis 
are  two-fold.  First,  the  prediction  for  county  12  is  clearly  the  most  sensitive,  and 
this  is  due  largely  to  the  effects  of  perturbing  case  33.  Second,  the  influence  that 
case  33  showed  on  Psoy,  ui,  and  1x7  (Figures  5.2e,  5.3a,  5.3g)  does  not  translate  into 
noticeable  influence  on  the  overall  predicted  means  in  the  other  11  counties. 

5.9.4    Scheme  2:  Cluster-based  Variance  Perturbation 

Having  seen  the  influence  of  perturbing  case  33's  variance,  we  may  want  to 
assess  the  eff'ects  perturbing  the  variance  associated  with  county  12.  To  this  end,  we 
implement  the  cluster-based  pertubation  scheme  suggested  in  Section  5.8.3.  Since 
there  are  only  12  perturbations,  we  implement  the  strategy  mentioned  in  Section 
2.6  -  including  the  benchmarks  ±-j=  =  ±.577  and  ±-7=  =  ±-866  as  horizontal 
lines.  This  should  prevent  us  from  confusing  somewhat  large  elements  with  truely 
prominent  ones.  Also,  directional  vectors  are  rendered  as  star  plots  as  well  as  index 
plots.  Recall  that  a  star  plot  is  merely  an  alternative  representation  that  suppresses 
the  signs  of  the  elements,  but  does  not  give  the  illusion  of  ordering  that  index  plots 
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do.  Each  element  is  displayed  as  a  point,  with  the  distance  from  the  origin  equal  to 
its  absolute  value.  The  plot  is  completed  by  labelling  the  points  with  indices  and 
drawing  rings  to  represent  the  benchmarks  ~  and  -p=. 

Figure  5.5  contains  index  plots  and  a  star  plots  of  the  three  distinct 
eigenvectors  resulting  from  the  cluster  perturbation.  Element  12  of  Umax  is  almost 
three  times  the  value  of  equally  contributing  elements  (plots  (a)  and  (b)).  Element 
11  of  ^2  is  nearly  two  times  that  of  e.c.e.  (plots  (c)  and  (d)),  while  element  8  of  U3 
is  about  two  times  that  of  e.c.e.   (plots  (e)  and  (f)).  Nonetheless,  the  second  two 
eigenvectors  tend  to  be  more  noise  than  signal. 

To  explore  further  how  cluster  12  is  affecting  inference.  Figure  5.6  gives 
directions  of  maximum  curvature  for  directed  influence  on  each  of  the  individual 
BLUEs  under  the  original  parameterization.  Element  12,  together  with  element  11, 
appears  prominantly  in  u^ax  for  both  of  the  slope  estimates.  This  indicates  that  the 
slopes,  rather  than  the  intercept,  are  being  influenced  by  county  12. 

To  assess  eff'ects  of  the  perturbation  on  the  set  of  «,,  Figure  5.7  gives  the 

••    [ul 

three  eigenvectors  of  M    .  The  direction  of  largest  curvature  shows  that  element 
11  is  more  than  two  times  that  of  equally  contributing  elements,  while  element  12 
is  roughly  three  times  that  of  e.c.e.  in  the  second  eigenvector.  Finally,  element  8 
is  most  prominent  in  the  third  eigenvector.  Because  element  12  stands  out  in  V2, 
but  not  Umax,  thcsc  plots  are  inconclusive  as  to  whether  county  12  is  influential  as  a 
whole  on  the  BLUPs. 

Finally,  we  address  the  main  question  of  interest-  whether  or  not  the 
predicted  mean  hectares  per  segment  are  sensitive  to  the  perturbation.  Since  there 
is  only  one  prediction  per  county,  we  can  use  the  perturbation  scheme  to  produce 
cluster-based  analogs  of  BasicFITS  and  MaxFITS  (Figure  5.8).  The  MaxFITS  values 
show  that  all  of  the  predicted  means  show  comparable  sensitivity  to  perturbation. 
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The  small  BasicFITS  values  indicate  that  none  of  the  counties  is  self-predicting:  not 
even  county  12. 

5.9.5  Scheme  3:  Alternate  Cluster-based  Variance  Perturbation 

The  alternate  cluster  variance  scheme  can  also  be  applied  to  the  data  set.  The 
acceleration  matrix  of  the  likelihood  displacement  has  twelve  non-zero  eigenvalues. 
The  first  two  eigenvalues  appear  in  Figure  5.9  as  index  plots  and  star  plots,  and  both 
vectors  implicate  elements  5  and  11  as  being  influential.  The  other  ten  eigenvectors 
are  dominated  by  a  single  unique  element,  and  are  not  shown. 

Figure  5.10  shows  the  directions  of  maximum  curvature  for  directed  influence 
on  each  of  the  individual  BLUEs.  Although  a  couple  of  elements  exceed  the  2/y/\2 
benchmark,  the  plots  are  relatively  noisy.  The  small  value  for  element  twelve  in  all 
of  the  plots  suggests  that  perturbing  county  12's  random  effect  variance  has  little 
effect  on  the  fixed  effects  estimates. 

Finally,  Figure  5.11  provides  the  BasicFITS/MaxFITS  plot  for  the  alternate 
cluster  perturbation.  The  first  thing  we  notice  about  the  plot  is  that  the  BasicFITS 
values  are  nearly  as  large  as  the  MaxFITs  values.  This  suggests  that  each  /ij  is  most 
sensitive  to  the  perturbation  of  V{ui).  We  also  recall  that  for  the  simple  random 
sample,  the  i^^  element  of  Umax  for  /tj  was  roughly  g  —1  times  larger  than  the  other 
elements.   Hence,  Figure  5.11  suggests  that  this  dominance  may  be  maintained 
in  more  complicated  models.  If  this  is  true,  it  may  be  sufficient  to  only  plot  the 
MaxFITS  values.  For  the  corn  data,  the  MaxFITS  values  indicate  that  the  predicted 
means  for  counties  with  large  BLUPs  (counties  5  and  11)  are  most  sensitive  to  the 
perturbation  of  the  cluster  variance. 

5.9.6  Conclusions  and  Action 

Having  examined  these  plots,  the  analyst  should  be  very  wary  of  observation 
33.  Scheme  I  indicates  that  if  its  precision  varies  from  that  of  the  other  segments 
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Figure  5.8:  Corn  data;  Scheme  2;  local  influence  on  predicted  means-cluster-based 
BasicFITS  and  MaxFITS 
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in  county  12,  the  county's  predicted  mean  hectares  of  corn  per  segment  is 
affected  (Figure  5.4(b)).   Using  the  weighted  county  perturbation  scheme  leads 
to  inconclusive  evidence  about  county  12's  overall  influence.   Although  element 
12  figures  prominantly  in  the  Vmax's  for  the  two  slope  estimates  (Figure  5.6), 
even  the  12*'*  county's  predicted  mean  appear  unaffected  (Figure  5.8).  Influence 
diagnostics  for  perturbing  cluster  variances  (Figures  5.9,  5.10,  and  5.11)  suggest 
that  the  assumption  of  equal  variance  for  the  12"*  random  effect  is  not  influential. 
Therefore,  it  is  deemed  that  differential  weighting  of  observation  33  is  the  main 
cause  of  sensitivity,  and  since  the  value  was  considered  suspect  a-priori,  deletion  or 
substantial  down-weighting  of  this  single  observation  seems  warranted. 
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Figure  5.10:  Corn  data;  Scheme  3;  Umax's  for  individual  BLUEs 
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Figure  5.11:  Corn  data;  Scheme  3;  influence  on  predicted  means    basic  curvatures 
and  maximum  curvatures 


CHAPTER  6 
MULTI-STAGE  ESTIMATIONS  AND  FUTURE  RESEARCH 

In  this  chapter,  several  areas  of  future  work  are  outlined,  most  of  which  stem 

from  the  generalization  of  local  assessment  to  estimating  functions  in  Chapter  4. 

First,  some  preliminary  results  are  given  for  extending  the  technique  to  multi-stage 

estimations.  Then,  a  synopsis  of  the  dissertation  is  presented  along  with  some 

additional  areas  of  future  research. 

6.1     Local  Assessment  in  Multi-stage  Estimations 
Sometimes  inference  takes  place  in  more  than  one  formal  stage.  One  example 
is  linear  mixed  models  when  variance  components  are  unknown.  Restricted/residual 
maximum  likelihood  (REML)  (Patterson  and  Thompson  1971;  Robinson  1987; 
Searle  et  al.  1992)  can  be  used  to  estimate  the  variance  parameters.  Then  the  mixed 
model  equations  are  used  to  estimate  /3  and  predict  m,  with  the  estimated  variance 
parameters  treated  as  known.  Local  influence  techniques  can  be  used  to  assess  the 
effects  of  perturbations  on  functions  of  the  two  sets  of  parameter  estimates  using 
results  from  Chapter  4. 

First,  let  Oi  denote  the  solution  of  pi  estimating  equations  hi{0i;y)  =  0. 
Further,  let  0^  be  the  solution  of  P2  estimating  equations  h2{02,y,Oi)  =  0  for 
fixed  ^1.  Perturbation  leads  to  two  sets  of  perturbed  equations,  /ii,w(^i;y)  =  0 
and  h2,ui{02',y,0i,u)  =  0,  with  solutions  ^i,,^  and  02,w  respectively.   The  effects 
of  perturbation  are  measured  by  m,  which  depends  on  u;  explicitly  and  via  0i^ 
and  02,ui-  The  algebraic  expressions  and  computational  formulas  for  first-  and 
second-order  influence  given  in  Chapter  4  can  be  used  by  simply  using  the  notation 
O'  =  {0[,0'2). 
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Because  /ii,a;(^i;y)  does  not  depend  on  02,^,  there  are  some  simplifications 
in  the  computational  formulas.  For  instance,  using  results  on  partitioned  matrices 
(Searle  1982), 


h'  = 


T   -1 


/ill      0 

-1 


h 


11 


0 


7l22  h2lhn       /I22 


(6.1) 


Partitioning  A  leads  to  a  simplification  in  J: 


d(jj' 


-h 


Ai 
A2 


(6.2) 


-/iiiAi 
h22  \h2\h11  Ai  -  A2J 


6.2     Synopsis  and  Additional  Future  Research 
The  local  influence  approach  to  diagnostics  is  largely  unknown,  owing  in  part 
to  the  lack  of  introductory  resources  on  the  technique.  It  is  hoped  that  the  extended 
and  detailed  discussion  in  the  first  two  chapters  of  this  thesis  have  shed  some  light 
on  this  powerful  and  practical  methodology.  The  diagnostics  derived  for  perturbing 
the  entire  X  matrix  in  Chapter  3  give  previously  unavailable  detail  about  how 
collinearity  and  overfitting  can  aff"ect  estimation.   The  results  should  encourage 
researchers  to  utilize  perturbations  other  than  just  case-weights  and  place  more 
emphasis  on  finding  closed-forms  for  the  resulting  diagnostics.  Because  analysts  are 
unaccustomed  to  using  directional  vectors,  having  interpretable  formulas  for  special 
cases  will  help  them  understand  and  appreciate  the  results. 
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The  foundation  of  the  local  influence  methodology  is  a  Taylor  series 
approximation  to  a  high-dimensional  combined  with  some  familiar  geometry 
concepts:  slopes  and  accelerations.  Using  these  ideas,  the  generalization  to  a  class  of 
influence  measures  for  optimal  estimating  functions  in  Chapter  4  is  straightforward. 
This  extention  of  the  methodology  provides  many  avenues  for  future  research; 
quasi-likelihood  and  mixed  model  estimation  are  only  the  beginning  of  the  areas  of 
application. 

As  for  other  areas  of  research,  the  idea  of  perturbations  is  an  excellent 
concept  for  linking  influence  diagnostics,  sensitivity  analysis,  robust  inference, 
and  aspects  of  optimal  design.  Robust  estimation  techniques  are  those  that  yield 
inferences  that  are  largely  insensitive  to  perturbations.  Consequently,  minor  aspects 
of  the  data,  model,  and  inference  technique  do  not  overly  influence  scientific 
conclusions.  If  a  perturbation  and  influence  measure  are  appropriate  for  several 
methods  of  inference,  the  maximum  local  influence  could  be  used  to  compare  the 
techniques'  robustness.  In  fact,  mimimizing  the  maximum  local  influence  could  even 
serve  as  a  criterion  for  robustness  (minimax  li  estimation).  Optimal  sampling  and 
experimental  designs,  which  are  efficient  under  assumed  conditions,  should  also  be 
designed  so  that  they  are  insensitive  to  perturbation.  Local  assessment  could  be 
used  to  assess  how  sampling  and  experimental  designs  can  ameliorate  or  compound 
influential  phenomenon. 
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APPENDIX  A 
MODEL  FITTING  INFORMATION  FOR  SCHOOL  DATA 


NOTE:  No  intercept  in  model.  R-squeire  is  redefined. 
Dependent  Variable:  VSCORE 

Analysis  of  Viuriance 


Sum  of 

Mean 

Source 

DF 

Squares 

Square 

F  Value 

Prob>F 

Model 

5 

17.21982 

3.44396 

29.019 

0.0001 

Error 

15 

1.78018 

0.11868 

U  Total 

20 

19.00000 

Root 

HSE 

0 

34450     R-square 

0 

9063 

Dep 

Mean 

- 

0 

00000     Adj 

R-sq 

0 

8751 

C.V. 

-6.7455S8E16 

Parameter  Estimates 


Peurameter 

Standard 

T  for  HO: 

Variable 

DF 

Estimate 

Error 

Parameter=0 

Prob  >  |T| 

SALARY 

-0.139993 

0.09301762 

-1.505 

0.1531 

HCOLR 

0.194122 

0.22907761 

0.847 

0.4101 

SES 

0.919607 

0.14859782 

6.189 

0.0001 

TSCORE 

0.250736 

0.09464627 

2.649 

0.0182 

HONED 

-0.203697 
Variimce 

0.22031344 

-0.925 

0.3698 

Variable 

DF 

Inflation 

SALARY 

1.38519879 

WCOLR 

8.40130895 

SES 

3.53513942 

TSCORE 

1.43413046 

MOMED 

7.77076288 

Collinearity  Diagnostics 


Number 


Condition 

Var  Prop 

Var  Prop 

Var  Prop 

Var  Prop 

Var  Prop 

r 

Eigenvalue 

Index 

SALARY 

WCOLR 

SES 

TSCORE 

MOMED 

1 

2.83681 

1.00000 

0.0134 

0.0129 

0.0296 

0.0071 

0.0142 

2 

1.39508 

1.42598 

0.2192 

0 . 0040 

0.0020 

0.2456 

0.0026 

3 

0.49664 

2.38997 

0.7601 

0.0006 

0.0064 

0.6485 

0.0004 

4 

0.20253 

3.74262 

0.0011 

0.0655 

0.9471 

0.0418 

0.1258 

5 

0.06894 

6.41490 

0.0062 

0.9170 

0.0149 

0.0570 

0.8570 
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Obs 


Dep  Var 
VSCORE 


Predict 
Value 


Std  Err  Std  Err   Student 

Predict  Residual  Residual  Residual 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


0.3314 

-1.4737 

0.2454 

0.9657 

0.3468 

-0.2033 

1 . 1548 

-0.2892 

1.0190 

0.3640 

-2.0255 

0.0202 

-0.0314 

-0.3408 

-2.1287 

0.7938 

-0.5643 

-0.5815 

1.3783 

1.0190 


0.2714 

-1.4135 

0.9244 

1.0471 

0.2126 

-0.1885 

1.0312 

-0.2146 

0.9116 

0.3278 

-1.6459 

-0.2800 

0 . 1489 

-0.2814 

-1.8229 

0.5707 

-0.3168 

-1.4413 

1 . 1853 

0.9738 


0.227 
0.228 
0.099 
0.120 
0.123 
0.231 
0.150 
0.083 
0.167 
0.260 
0.169 
0.205 
0.195 
0.084 
0.188 
0.113 
0.169 
0.181 
0.167 
0.143 


0.0600 
-0.0602 
-0.6790 
-0.0814 

0 . 1342 
-0.0147 

0.1236 
-0.0747 

0.1073 

0 . 0362 
-0.3796 

0 . 3002 
-0.1803 
-0.0594 
-0.3058 

0.2230 
-0.2475 

0.8598 

0.1930 

0 . 0452 


0.260 
0.259 
0.330 
0.323 
0.322 
0.256 
0.310 
0.334 
0.301 
0.226 
0.300 
0.277 
0.284 
0.334 
0.289 
0.326 
0.300 
0.293 
0.301 
0.313 


0.231 
-0.233 
-2.058 
-0 . 252 

0.417 
-0.058 

0.398 
-0.223 

0.356 

0.160 
-1.265 

1.083 
-0.634 
-0.178 
-1.058 

0.685 
-0.825 

2.932 

0.640 

0.144 


Obs 


-2-1-0  1  2 


Cook's 


D  Rstudent 


Hat  Diag 
H 


Cov 
Ratio 


Dffits 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


0.008 
0.008 
0.077 
0.002 
0.005 
0.001 
0.007 
0.001 
0.008 
0.007 
0.102 
0.128 
0.038 
0.000 
0.094 
0.011 
0.043 
0.654 
0.025 
0.001 


0.2236 
-0.2251 
-2 . 3475 
-0.2441 

0.4055 
-0.0557 

0.3868 
-0.2160 

0.3457 

0.1545 
-1.2931 

1.0900 
-0.6212 
-0.1720 
-1.0629 

0.6725 
-0.8152 

4.3370 

0.6272 

0.1394 


0.4325 
0.4362 
0.0831 
0.1212 
0.1281 
0.4499 
0.1891 
0.0575 
0.2354 
0.5680 
0.2414 
0.3527 
0.3193 
0.0588 
0.2964 
0.1069 
0.2406 
0.2756 
0 . 2346 
0.1727 


2.4440 
2.4596 
0.2929 
1.5729 
1.5276 
2.5639 
1.6511 
1.4734 
1.7697 
3.2410 
1.0588 
1.4516 
1.8106 
1.4844 
1.3615 
1.3486 
1.4745 
0.0276 
1.6059 
1.6948 


0.1952 
-0.1980 
-0.7067 
-0.0906 

0.1555 
-0.0504 

0.1868 
-0.0534 

0.1918 

0.1771 
-0.7295 

0.8046 
-0.4254 
-0.0430 
-0.6899 

0.2326 
-0.4589 

2 . 6748 

0.3472 

0.0637 
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SALARY 

WCOLR 

SES 

TSCORE 

HOMED 

lbs 

Dfbetas 

Dfbetas 

Dfbetas 

Dfbetas 

Dfbetas 

1 

0.1627 

-0.0882 

0.0679 

-0.0276 

0.0339 

2 

-0.0660 

-0.1443 

0.0717 

0.0126 

0 . 1444 

3 

0.0734 

-0.0199 

0.0654 

-0.2036 

-0.2487 

4 

-0.0033 

0.0428 

-0.0220 

-0.0051 

-0.0563 

6 

0.0769 

-0.1122 

0.1070 

-0.0453 

0 . 0394 

6 

0 . 0036 

0.0155 

-0.0154 

0.0418 

-0.0105 

7 

-0.0748 

0.1413 

-0.0036 

0.0380 

-0.0973 

8 

0.0312 

-0.0014 

-0.0220 

-0.0142 

0.0250 

9 

0.0130 

0.1498 

-0.0105 

0 . 1046 

-0.1325 

10 

-0.1007 

-0.0145 

0.0400 

0.1489 

-0.0279 

11 

0.2465 

0.0798 

0.3961 

0.1611 

-0.2056 

12 

0.0446 

0.2661 

0.4139 

-0.2827 

-0.6298 

13 

0.1588 

0.2764 

-0.3238 

0.0694 

-0.0699 

14 

0.0169 

0.0145 

0.0204 

-0.0276 

-0.0236 

16 

-0.0457 

-0.0462 

0.6017 

-0.1181 

-0.1775 

16 

0.1471 

0 . 0342 

-0.0379 

-0.0840 

0.0415 

17 

-0.4650 

0 . 0429 

0.0595 

0.2269 

-0.0573 

18 

-0.2034 

-1.1197 

-1.8118 

0.0013 

1.9401 

19 

-0.1005 

0 . 0353 

-0.1001 

0.0803 

0.1385 

20 

-0.0336 

0.0341 

-0.0005 

0.0048 

-0.0165 
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APPENDIX  B 
MODEL  FITTING  INFORMATION  FOR  CORN  CROP  DATA 

The  MIXED  Procedure 
Class  Level  Information 
Class     Levels  Values 

CNTYNUH       12  1  2  3  4  5  6  7  8  9  10  11  12 

REHL  Estimation  Iteration  History 

Iteration  Evaluations     Objective     Criterion 

0  1  243.72248590 

1  2  242.59476305    0.00000000 

Convergence  criteria  met. 

Covariance  Parameter  Estimates  (REML) 

Cov  Parm      Estimate     Std  Error       Z  Pr  >  |Z| 

CNTYNUM     63.28953258   75.07490859    0.84    0.3992 
Residual   297.72802207   84.64939011    3.62    0.0004 

Asymptotic  Covarlimce  Matrix  of  Estimates 

Cov  Parm    Row        COVPl        C0VP2 

CNTYNUM       1  5636.2419002  -2123.731245 
Residual      2  -2123.731245  7148.5993683 


Model  Fitting  Information  for  HECTCORN 
Description   *  ^  Value 


Observations  37 

Res  Log  Likelihood  -152.541 

Akaike's  Information  Criterion  -154.541 

Schwarz's  Bayesian  Criterion  -156.068 

-2  Res  Log  Likelihood  305.0826 


Solution  for  Fixed  Effects 
Effect        Estimate 


INTERCEPT  120.74028537 
PIX.CORN  26.76987997 
PII.SOT     -2.04732793 


Std  Error 

DF 

t 

Pr  >  Itl 

3.78206463 

11 

31.92 

0.0001 

4.56957580 

23 

5.64 

0.0001 

4.56652013 

23 

-0.45 

0.6574 
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Solution  for  Random  Effects 


Effect 

CNTYNUM  Estimate 

SE  Pred 

DF 

t 

Pr  >  Itl 

CNXyNUM 

1 

2.18380161 

7 . 35943306 

23 

0.30 

0.7693 

CNTYNUM 

2 

1.47455727 

7.30784520 

23 

0.20 

0.8419 

CNTYNUM 

3 

-4.72910256 

7.26641831 

23 

-0.65 

0.5216 

CNTYNUM 

4 

-2.76387525 

6.88938048 

23 

-0.40 

0.6920 

CNTYNUM 

5 

8.36866398 

6.42653277 

23 

1.30 

0 . 2057 

CNTYNUM 

6 

4.27362586 

6.52255117 

23 

0.66 

0.5188 

CNTYNUM 

7 

-2.70476376 

6.50319394 

23 

-0.42 

0.6813 

CNTYNUM 

8 

1.15642520 

6.47806611 

23 

0.18 

0 . 8599 

CNTYNUM 

9 

5.02562759 

6.21203551 

23 

0.81 

0.4268 

CNTYNUM 

10 

-2.88274430 

5.92509787 

23 

-0.49 

0.6312 

CNTYNUM 

11 

-8.65058657 

5.88291972 

23 

-1.47 

0 . 1550 

CNTYNUM 

12 

-0.75162907 

5.69287011 

23 

-0.13 

0.8961 

Tests  of  Fixed  Effects 
Source      NDF   DDF  Type  III  F  Pr  >  F 


PIX.CORM 

1 

23 

31.80  0.0001 

PII.SOY 

1 

23 

0.20  0.6574 

Predicted  Values 
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HECTCORN 

Predicted 

SE  Pred 

L95 

n95 

Residual 

165.7600 

155.4869 

9.7838 

135 

.2477 

175 

.7262 

10.2731 

96.3200 

89.3834 

8.8869 

70 

.9994 

107 

.7674 

6.9366 

76.0800 

98.3267 

8.1004 

81 

.5698 

115 

.0837 

-22.2467 

185.3500 

170.5417 

8.5361 

152 

.8834 

188 

.2000 

14.8083 

116.4300 

144.2402 

7.5387 

128 

.6451 

159 

.8353 

-27.8102 

162.0800 

154.4196 

7.0506 

139 

.8343 

169 

.0049 

7 . 6604 

152.0400 

125.5822 

6.7333 

111, 

.6532 

139 

.5112 

26.4578 

161.7500 

156.5001 

7.1277 

141, 

.7553 

171 

.2449 

5 . 2499 

92.8800 

91.0834 

7.9300 

74 

.6789 

107 

.4879 

1.7966 

149.9400 

131.2890 

7 . 2587 

116 

.2733 

146, 

.3048 

18.6510 

64 . 7500 

65.0934 

8.7890 

46 

.9121 

83 

.2748 

-0.3434 

127.0700 

141.4215 

7.1954 

126, 

.5367 

156, 

.3062 

-14.3515 

133.5500 

118.8645 

7.1089 

104, 

.1587 

133, 

.5704 

14.6855 

77.7000 

90.7578 

7.6708 

74 

.8896 

106 

.6260 

-13.0578 

206.3900 

184.9299 

9.1302 

166, 

,0426 

203, 

.8172 

21.4601 

108.3300 

118.7686 

6.7585 

104, 

.7876 

132 

.7496 

-10.4386 

118.1700 

123.7514 

7.6313 

107, 

.9648 

139, 

.5380 

-5.5814 

99.9600 

106.1059 

7.3867 

90, 

.8254 

121, 

.3864 

-6.1459 

140.4300 

123.6154 

6.2159 

110, 

,7567 

136, 

.4740 

16.8146 

98.9500 

91.7140 

8.0456 

75, 

.0704 

108, 

.3575 

7.2360 

131.0400 

125.3031 

7 . 5450 

109, 

,6951 

140, 

.9111 

5.7369 

114.1200 

123.9749 

5.9584 

111, 

,6490 

136, 

.3009 

-9.8549 

100.6000 

97.0015 

6.3382 

83 

.8900 

110, 

.1130 

3.5985 

127.8800 

139.1748 

6.4405 

125, 

,8516 

152, 

.4980 

-11.2948 

116.9000 

107.4351 

5 . 9034 

95, 

.2230 

119, 

.6473 

9 . 4649 

87.4100 

92.8847 

6.9265 

78, 

.5562 

107, 

.2132 

-5.4747 

93 . 4800 

85.2028 

9.2386 

66 

.0913 

104, 

.3142 

8 . 2772 

121.0000 

138.6914 

6.9531 

124, 

.3079 

153, 

.0749 

-17.6914 

109.9100 

127.4057 

8.0247 

110, 

.8053 

144, 

.0061 

-17.4957 

122.6600 

129.0737 

6.0804 

116, 

.4955 

141, 

.6519 

-6.4137 

104.2100 

111.5808 

6.1297 

98 

.9005 

124, 

.2610 

-7.3708 

88.5900 

89.8508 

6.3707 

76 

.6720 

103 

.0297 

-1.2608 

88.5900 

139.1245 

8.2468 

122, 

.0648 

156 

.1843 

-50.5345 

165.3500 

142.4030 

6.2021 

129, 

.5729 

155, 

.2330 

22.9470 

104.0000 

106.1154 

5.7054 

94 

.3129 

117 

.9179 

-2.1154 

88.6300 

75.2417 

8.4331 

57 

.7964 

92 

.6869 

13.3883 

163.7000 

139.6604 

6.3394 

126 

.5464 

152 

.7744 

14.0396 
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